Comparison of different SACAs available in rust.
name | description |
---|---|
cdivsufsort |
C-binding to Yuta Mori's dissufsort, by Amos Wenger |
divsufsort |
Amos Wenger's hand-ported divsufsort in rust |
suffix_array |
a partially paralleled SACA-K for binary data, along with searching algorithms |
suffix |
burntsushi's suffix, featuring utf-8 text indexing, using a SAIS variant with two additional bytes per character. |
Test in a Intel Core i5-4200H CPU @ 2.80GHz (2 cores, 4 threads)
machine with 12 GiB
memory. Test data was obtained from Pizza&Chili Corpus.
Table:
name | dna-50m |
midi-50m |
xml-50m |
code-50m |
en-50m |
pr-50m |
---|---|---|---|---|---|---|
cdivsufsort |
5.973s | 3.774s | 4.222s | 4.818s | 5.618s | 7.167s |
divsufsort |
7.617s | 5.345s | 5.991s | 6.012s | 7.844s | 9.241s |
suffix_array |
14.999s | 10.244s | 13.203s | 13.433s | 17.660s | 18.028s |
suffix |
24.251s | 15.203s | 19.828s | 18.855s | 27.751s | 30.532s |
suffix (force utf8) |
24.333s | 15.426s | 19.289s | 17.699s | 27.779s | 29.843s |
Output log:
* file `dna-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 5.973s
divsufsort: 52428800 indexes in 7.617s
suffix_array: 52428801 indexes in 14.999s
suffix: 52428800 indexes in 24.251s
suffix (force utf8): 52428800 indexes in 24.333s
* file `midi-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 3.774s
divsufsort: 52428800 indexes in 5.345s
suffix_array: 52428801 indexes in 10.244s
suffix: 52446443 indexes in 15.203s
suffix (force utf8): 52428800 indexes in 15.426s
* file `xml-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 4.222s
divsufsort: 52428800 indexes in 5.991s
suffix_array: 52428801 indexes in 13.203s
suffix: 52428800 indexes in 19.828s
suffix (force utf8): 52428800 indexes in 19.289s
* file `code-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 4.818s
divsufsort: 52428800 indexes in 6.012s
suffix_array: 52428801 indexes in 13.433s
suffix: 52429385 indexes in 18.855s
suffix (force utf8): 52428800 indexes in 17.699s
* file `en-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 5.618s
divsufsort: 52428800 indexes in 7.844s
suffix_array: 52428801 indexes in 17.660s
suffix: 52445886 indexes in 27.751s
suffix (force utf8): 52428800 indexes in 27.779s
* file `pr-50m` (52428800 bytes) *
cdivsufsort: 52428800 indexes in 7.167s
divsufsort: 52428800 indexes in 9.241s
suffix_array: 52428801 indexes in 18.028s
suffix: 52428800 indexes in 30.532s
suffix (force utf8): 52428800 indexes in 29.843s