-
-
Save rianhunter/0be8dc116b120ad5fdd4 to your computer and use it in GitHub Desktop.
/* | |
This benchmark shows how indirect function calls have nearly | |
the same overhead costs as direct function calls. | |
This is comparing apples to apples, factoring out the savings | |
due to inlining optimizations that direct calls usually afford. | |
From this, it seems that inlining and other generic interprocedual | |
optimizations are the main drivers of direct function call optimization, | |
not the direct call itself. | |
In other words, unless you implement your functions in header files | |
or compile with LTO (Link Time Optimization), there is virtually no benefit | |
to using direct function calls, aka static dispatch. | |
*/ | |
#include <limits.h> | |
int foo(int a) { | |
return a; | |
} | |
int direct_version() { | |
int i, b = 0; | |
for (i = 0; i < INT_MAX; ++i) { | |
b = foo(b); | |
} | |
return b; | |
} | |
int indirect_version(int (*fn)(int)) { | |
int i, b = 0; | |
for (i = 0; i < INT_MAX; ++i) { | |
b = fn(b); | |
} | |
return b; | |
} | |
int main(int argc, char *argv[]) { | |
if (argc == 2 && argv[1][0] == 'd') { | |
return direct_version(); | |
} | |
else { | |
return indirect_version(&foo); | |
} | |
} | |
/* | |
$ cc -O3 -fno-inline -fomit-frame-pointer call_overhead.c | |
$ time ./a.out d | |
real 0m5.199s | |
user 0m5.155s | |
sys 0m0.021s | |
$ time ./a.out s | |
real 0m5.251s | |
user 0m5.215s | |
sys 0m0.018s | |
*/ |
A even simpler change makes the problem even worse. I modified the for-loop to ignore the return value of foo(). Then, in the direct case, the compiler understood that there is no point in calling the function, since it can see that the foo() has no side-effects. So the for-loop became a no-op. But it didn't make the same decision for the indirect version because it didn't know if the function pointed to has any side effects.
Obviously the performance difference then is massive.
This illustrates the general problem that indirect functions make it harder for the compiler to apply its optimizations.
Here are the functions I modified:
int direct_version() {
int i, b = 0;
for (i = 0; i < INT_MAX; ++i) {
foo(b);
}
return b;
}
int indirect_version(int (*fn)(int)) {
int i, b = 0;
for (i = 0; i < INT_MAX; ++i) {
fn(b);
}
return b;
}
Thanks for this @rianhunter and the response @abainbridge 👍
Additionally, if the function call can be inlined, the indirect version has function call overhead, while the direct version does not.
Where this gets interesting is where the reason for the indirection is the presence or absence of a conditional (or conditional tree) to select a function. I.e. "how many 'if's does one need to eliminate to justify the performance penalties of the function call indirection?"
I like your code - clear and to-the-point. However, I think it is a little misleading in the general case. To understand more I looked at the assembly generated for x86. After I cleaned it up to make it more readable, it looked like this:
Differences between the direct and indirect versions are:
The extra setup overhead doesn't matter much, unless the loop count is tiny. But the extra register use does matter. In real code, register contention is often a problem - it is more of a problem on x86 than instruction sets with more registers, but I don't think we should ignore this cost in any case.
To investigate the cost, I changed your code to use additional copies of foo(). My code is below. Timing the resulting executable, I found that the indirect version was 3.4x slower.