---- Basics - int64 ----
100 length
10000 loops, best of 3: 100 µs per loop
1000 length
10000 loops, best of 3: 166 µs per loop
10000 length
1000 loops, best of 3: 367 µs per loop
100000 length
100 loops, best of 3: 2.67 ms per loop
100000 length with few #s
100 loops, best of 3: 2.52 ms per loop
100000 length with many #s
10 loops, best of 3: 18.4 ms per loop
---- Object type with strings ----
100 length
10000 loops, best of 3: 196 µs per loop
1000 length
1000 loops, best of 3: 306 µs per loop
10000 length
1000 loops, best of 3: 1.2 ms per loop
100000 length
100 loops, best of 3: 10.7 ms per loop
100000 length with few values
100 loops, best of 3: 10.5 ms per loop
100000 length with many values
10 loops, best of 3: 54.1 ms per loop
---- Timedelta type for [should use int64] ----
100 length
1000 loops, best of 3: 205 µs per loop
1000 length
1000 loops, best of 3: 238 µs per loop
10000 length
1000 loops, best of 3: 392 µs per loop
100000 length
100 loops, best of 3: 2.08 ms per loop
100000 length with few #s
100 loops, best of 3: 2.81 ms per loop
100000 length with many #s
100 loops, best of 3: 18 ms per loop
Still quite slow with strings! However, the last string case takes ~80ms with value_counts, so ~40ms isn't terrible [for comparison, constructing a Series with 100K string values takes ~10ms]
---- Basics - int64 ----
100 length
10000 loops, best of 3: 81.3 µs per loop
1000 length
10000 loops, best of 3: 96.5 µs per loop
10000 length
10000 loops, best of 3: 182 µs per loop
100000 length
1000 loops, best of 3: 1.34 ms per loop
100000 length with few #s
1000 loops, best of 3: 1.48 ms per loop
100000 length with many #s
100 loops, best of 3: 4.22 ms per loop
---- Object type with strings ----
100 length
1000 loops, best of 3: 187 µs per loop
1000 length
1000 loops, best of 3: 305 µs per loop
10000 length
1000 loops, best of 3: 1.4 ms per loop
100000 length
100 loops, best of 3: 12.1 ms per loop
100000 length with few values
100 loops, best of 3: 11.8 ms per loop
100000 length with many values
10 loops, best of 3: 38.7 ms per loop
---- Timedelta type for [should use int64] ----
100 length
10000 loops, best of 3: 209 µs per loop
1000 length
1000 loops, best of 3: 218 µs per loop
10000 length
1000 loops, best of 3: 335 µs per loop
100000 length
1000 loops, best of 3: 1.64 ms per loop
100000 length with few #s
1000 loops, best of 3: 1.5 ms per loop
100000 length with many #s
100 loops, best of 3: 4.13 ms per loop
Hurray for optimizing a pathological case [and, you know, general perf boost]!
(earlier examples were copy/pasted inline - doesn't work to do cdef inline in this case)
›› ipython perf_test.ipy # separate func
---- Basics - int64 ----
100 length
10000 loops, best of 3: 81.7 µs per loop
1000 length
10000 loops, best of 3: 110 µs per loop
10000 length
1000 loops, best of 3: 313 µs per loop
100000 length
100 loops, best of 3: 2.41 ms per loop
100000 length with few #s
100 loops, best of 3: 2.57 ms per loop
100000 length with many #s
100 loops, best of 3: 4.82 ms per loop
---- Object type with strings ----
100 length
1000 loops, best of 3: 166 µs per loop
1000 length
1000 loops, best of 3: 277 µs per loop
10000 length
1000 loops, best of 3: 1.18 ms per loop
100000 length
100 loops, best of 3: 11.3 ms per loop
100000 length with few values
100 loops, best of 3: 11 ms per loop
100000 length with many values
10 loops, best of 3: 35.8 ms per loop
---- Timedelta type for [should use int64] ----
100 length
10000 loops, best of 3: 185 µs per loop
1000 length
1000 loops, best of 3: 198 µs per loop
10000 length
1000 loops, best of 3: 350 µs per loop
100000 length
100 loops, best of 3: 1.95 ms per loop
100000 length with few #s
100 loops, best of 3: 2.65 ms per loop
100000 length with many #s
100 loops, best of 3: 4.36 ms per loop
Side note - all of these counts are relatively consistent over multiple runs (ran each about 3-5 times) No need to optimize DataFrame version - it's just putting together Series and that overhead is trivial.
This is running val_perf
›› ipython val_count_perf.ipy
---- Basics - int64 ----
100 length
1000 loops, best of 3: 458 µs per loop
1000 length
1000 loops, best of 3: 528 µs per loop
10000 length
1000 loops, best of 3: 755 µs per loop
100000 length
100 loops, best of 3: 3.17 ms per loop
100000 length with few #s
100 loops, best of 3: 3.13 ms per loop
100000 length with many #s
10 loops, best of 3: 25.9 ms per loop
---- Object type with strings ----
100 length
1000 loops, best of 3: 668 µs per loop
1000 length
1000 loops, best of 3: 799 µs per loop
10000 length
100 loops, best of 3: 1.84 ms per loop
100000 length
100 loops, best of 3: 13.8 ms per loop
100000 length with few values
100 loops, best of 3: 13.2 ms per loop
100000 length with many values
10 loops, best of 3: 86 ms per loop
---- Timedelta type for [should use int64] ----
100 length
1000 loops, best of 3: 716 µs per loop
1000 length
1000 loops, best of 3: 810 µs per loop
10000 length
1000 loops, best of 3: 962 µs per loop
100000 length
100 loops, best of 3: 2.79 ms per loop
100000 length with few #s
100 loops, best of 3: 3.81 ms per loop
100000 length with many #s
10 loops, best of 3: 42.3 ms per loop
›› ipython val_count_perf.ipy # abstracted func
---- Basics - int64 ----
100 length
1000 loops, best of 3: 485 µs per loop
1000 length
1000 loops, best of 3: 535 µs per loop
10000 length
1000 loops, best of 3: 781 µs per loop
100000 length
100 loops, best of 3: 3.1 ms per loop
100000 length with few #s
100 loops, best of 3: 3.47 ms per loop
100000 length with many #s
10 loops, best of 3: 29.3 ms per loop
---- Object type with strings ----
100 length
1000 loops, best of 3: 666 µs per loop
1000 length
1000 loops, best of 3: 834 µs per loop
10000 length
1000 loops, best of 3: 1.89 ms per loop
100000 length
100 loops, best of 3: 14 ms per loop
100000 length with few values
100 loops, best of 3: 13.3 ms per loop
100000 length with many values
10 loops, best of 3: 86.3 ms per loop
---- Timedelta type for [should use int64] ----
100 length
1000 loops, best of 3: 758 µs per loop
1000 length
1000 loops, best of 3: 826 µs per loop
10000 length
1000 loops, best of 3: 961 µs per loop
100000 length
100 loops, best of 3: 2.84 ms per loop
100000 length with few #s
100 loops, best of 3: 3.72 ms per loop
100000 length with many #s
10 loops, best of 3: 42.8 ms per loop