Last active
June 28, 2024 03:57
-
-
Save Kangaroux/3691597d6c85ac16f68a59cf896f08e6 to your computer and use it in GitHub Desktop.
Go benchmark optimizations: be careful
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
This is an example of why you should always inspect your benchmark results, | |
and why you can't rely on -gcflags=-N to disable all optimizations. | |
The benchmarks compare copying a 1024 byte array, one from the stack and the | |
other from the heap. BenchmarkHeapBad will be optimized by the compiler even | |
with optimization disabled. It sees a constant in make() and that the variable | |
never escapes, so it converts it to the stack. BenchmarkHeapGood instead passes | |
the size in as a function argument, avoiding the heap->stack optimization. | |
If we ignore BenchmarkHeapOK and run the benchmark, at first glance it seems like | |
there is no difference: | |
$ go test -bench='BenchmarkStack|BenchmarkHeapBad' -benchmem | |
BenchmarkStack-12 53486606 22.44 ns/op 0 B/op 0 allocs/op | |
BenchmarkHeapBad-12 52779337 22.22 ns/op 0 B/op 0 allocs/op | |
We expect the benchmarks would be different, so something's not right here. We also | |
notice that BenchmarkHeapBad has zero allocations, despite calling make(). Let's try | |
running it again but this time with compiler optimizations disabled: | |
$ go test -bench='BenchmarkStack|BenchmarkHeapBad' -benchmem -gcflags=-N | |
BenchmarkStack-12 49120558 24.14 ns/op 0 B/op 0 allocs/op | |
BenchmarkHeapBad-12 49287187 22.82 ns/op 0 B/op 0 allocs/op | |
The results are the same, so let's see how BenchmarkHeapOK does: | |
$ go test -bench=. -benchmem | |
BenchmarkStack-12 53486606 22.44 ns/op 0 B/op 0 allocs/op | |
BenchmarkHeapBad-12 52779337 22.22 ns/op 0 B/op 0 allocs/op | |
BenchmarkHeapOK-12 11905509 168.6 ns/op 1024 B/op 1 allocs/op | |
Now we're seeing the expected allocation, and a ~7x difference in speed compared to the stack. | |
*/ | |
package main | |
import "testing" | |
func BenchmarkStack(b *testing.B) { | |
var buf [1024]byte | |
for i := 0; i < b.N; i++ { | |
func() { | |
var data [1024]byte | |
copy(buf[:], data[:]) | |
}() | |
} | |
} | |
func BenchmarkHeapBad(b *testing.B) { | |
var buf [1024]byte | |
for i := 0; i < b.N; i++ { | |
func() { | |
data := make([]byte, 1024) // BAD: compiler will optimize to `var data [1024]byte` even if you disable optimization! | |
copy(buf[:], data[:]) | |
}() | |
} | |
} | |
func BenchmarkHeapOK(b *testing.B) { | |
var buf [1024]byte | |
for i := 0; i < b.N; i++ { | |
func(n int) { | |
data := make([]byte, n) | |
copy(buf[:], data[:]) | |
}(1024) // OK: passing the size as an argument prevents Go from optimizing | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment