Last active
August 29, 2015 14:06
-
-
Save evotopid/876fcbf2724c8876a454 to your computer and use it in GitHub Desktop.
Ruby Benchmark: String#bytesize vs String#size
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# String#bytesize vs String#size | |
require 'benchmark' | |
N=20_000_000 | |
short_utf8_string = "ääää" | |
long_utf8_string = "äääääääääääääääääääääääääääääääääääääääääääääääääääääääää" | |
short_ascii_string = "aaaa" | |
long_ascii_string = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" | |
short_mixed_string = "aäaä" | |
long_mixed_string = "aäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäaäa" | |
Benchmark.bmbm do |x| | |
x.report("utf8.short.size"){N.times{ short_utf8_string.size }} | |
x.report("utf8.short.bytesize"){N.times{ short_utf8_string.bytesize }} | |
x.report("utf8.long.size"){N.times{ long_utf8_string.size }} | |
x.report("utf8.long.bytesize"){N.times{ long_utf8_string.bytesize }} | |
x.report("ascii.short.size"){N.times{ short_ascii_string.size }} | |
x.report("ascii.short.bytesize"){N.times{ short_ascii_string.bytesize }} | |
x.report("ascii.long.size"){N.times{ long_ascii_string.size }} | |
x.report("ascii.long.bytesize"){N.times{ long_ascii_string.bytesize }} | |
x.report("mixed.short.size"){N.times{ short_mixed_string.size }} | |
x.report("mixed.short.bytesize"){N.times{ short_mixed_string.bytesize }} | |
x.report("mixed.long.size"){N.times{ long_mixed_string.size }} | |
x.report("mixed.long.bytesize"){N.times{ long_mixed_string.bytesize }} | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Conclusion: | |
- Don't call size on a string which might contain utf8 characters. | |
- You'll be probably fine using bytesize on everything, but if you | |
know there are only going to be ascii characters you may be even | |
better off just using size. | |
Output: | |
user system total real | |
utf8.short.size 2.910000 0.010000 2.920000 ( 2.916396) | |
utf8.short.bytesize 2.530000 0.000000 2.530000 ( 2.539148) | |
utf8.long.size 3.140000 0.010000 3.150000 ( 3.138874) | |
utf8.long.bytesize 2.270000 0.000000 2.270000 ( 2.267662) | |
ascii.short.size 2.030000 0.000000 2.030000 ( 2.032265) | |
ascii.short.bytesize 2.460000 0.000000 2.460000 ( 2.464350) | |
ascii.long.size 2.200000 0.000000 2.200000 ( 2.201368) | |
ascii.long.bytesize 2.470000 0.010000 2.480000 ( 2.480259) | |
mixed.short.size 2.780000 0.000000 2.780000 ( 2.789172) | |
mixed.short.bytesize 2.920000 0.010000 2.930000 ( 2.928942) | |
mixed.long.size 3.340000 0.000000 3.340000 ( 3.347672) | |
mixed.long.bytesize 2.610000 0.000000 2.610000 ( 2.609181) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment