Binary search is theoretically optimal, but it's possible to speed it up substantially using AVX2 and branchless code even in .NET Core.
Memory access is the limiting factor for binary search. When we access each element for comparison a cache line is loaded, so we could load a 32-byte vector almost free, check if it contains the target value, and if not reduce the search space by 32/sizeof(T)
instead of 1 element.
AVX512 with _mm256_cmpge_epi64_mask
instruction should improve it even more, but it is not available on .NET yet.
SearchBench
Case | MOPS | Elapsed | GC0 | GC1 | GC2 | Memory |
---|---|---|---|---|---|---|
BS_Classic_2 | 570.38 | 15 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_1 | 569.90 | 15 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_1 | 505.60 | 17 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_2 | 504.03 | 17 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_4 | 503.77 | 17 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_4 | 288.84 | 29 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_8 | 285.34 | 29 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_16 | 186.61 | 45 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_8 | 183.58 | 46 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_32 | 151.42 | 55 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_64 | 117.92 | 71 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_16 | 111.73 | 75 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_128 | 95.59 | 88 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_32 | 90.19 | 93 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_64 | 71.43 | 117 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_256 | 63.05 | 133 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_128 | 60.40 | 139 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_256 | 47.07 | 178 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_512 | 40.02 | 210 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_1024 | 31.25 | 268 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_512 | 28.62 | 293 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_4096 | 22.84 | 367 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_1024 | 22.35 | 375 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Avx_16384 | 19.77 | 424 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_4096 | 19.03 | 441 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |
BS_Classic_16384 | 17.71 | 474 ms | 0.0 | 0.0 | 0.0 | 0.000 MB |