For a project I am on I need to use a key-value store to converts file-paths to fixnum IDs. The dataset will typically be in the range of 100 000 to 1 000 000. These tests use 305 000 file paths to fixnum IDs.
The Different Key-Value stores tested are:
Daybreak: "Daybreak is a simple and very fast key value store for ruby" GDBM: GNU dbm. "a simple database engine for storing key-value pairs on disk." DBM: "The DBM class provides a wrapper to a Unix-style dbm or Database Manager library" PStore: "PStore implements a file based persistence mechanism based on a Hash. "
Out of these, all except Daybreak are in the Ruby standard library.
This test was run 3 times on my SSD based Macbook Air 2011 with 4GB ram.
##Test code:
#benchmarking different DB systems for load of 305_000 file paths.
require "benchmark"
require "fileutils"
require "daybreak"
require "PStore"
require "gdbm"
require "dbm"
main_path = File.join(Dir.pwd, "test_file")
testdatat = (1..305_000).map{|e| ["#{main_path}#{e}.pdf", e]}
def delete_files
begin
FileUtils.rm("testDaybreak.db") if File.file?("testDaybreak.db")
FileUtils.rm("testGDBM.db") if File.file?("testGDBM.db")
FileUtils.rm("testDBM.db") if File.file?("testDBM.db")
FileUtils.rm("testPStore.db") if File.file?("testPStore.db")
rescue Exception => e
puts "Error when deleting: "
puts e.message
puts e.backtrace.inspect
end
end
delete_files
class DaybreakWrapper
@store = nil
def initialize
@store = Daybreak::DB.new("testDaybreak.db")
end
def []=(key,val)
@store[key] = val
end
def [](key)
@store[key]
end
def values
@store.keys.map{|k| @store[k]}
end
def keys
@store.keys
end
def delete(key)
@store.delete(key)
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm("testDaybreak.db")
end
def sync_lock
end
end
class GDBMWrapper
@store = nil
def initialize
@store = GDBM.new("testGDBM.db")
end
def []=(key,val)
@store[Marshal.dump(key)] = Marshal.dump(val)
end
def [](key)
Marshal.load(@store[Marshal.dump(key)])
end
def values
@store.values
end
def keys
@store.keys.map{|e| Marshal.load(e)}
end
def delete(key)
@store.delete(Marshal.dump(key))
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm("testGDBM.db")
end
def sync_lock
end
end
class DBMWrapper
@store = nil
def initialize
# @store = DBM.open("testDBM", 666, DBM::WRCREAT)
@store = DBM.new("testDBM")
end
def []=(key,val)
@store[key] = val
end
def [](key)
@store[key]
end
def values
@store.values
end
def keys
@store.keys
end
def delete(key)
@store.delete(key)
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm("testDBM.db")
end
def sync_lock
end
end
class PStoreWrapper
@store = nil
def initialize
@store = PStore.new("testPStore.db")
end
def []=(key,val)
transaction do
return @store[key] = val
end
end
def [](key)
transaction do
return @store[key]
end
end
def values
transaction do
return @store.roots.map{|e| @store[e]}
end
end
def keys
transaction do
return @store.roots
end
end
def delete(key)
transaction do
return @store.delete(key)
end
end
def stop
# transaction do
# @store.commit
# end
end
def destroy
# transaction do
# @store.destroy
# end
FileUtils.rm("testPStore.db")
end
def sync_lock
@store.transaction do
yield
end
end
# Public: Creates a transaction. Nested transactions are allowed.
#
# Returns nothing.
def transaction
unless @in_transaction
@in_transaction = true
sync_lock do
yield
end
@in_transaction = false
else
yield
end
end
end
class HashWrapper
@@superhash = {}
def initialize
#@@superhash = {} unless @@superhash
end
def []=(key,val)
@@superhash[key] = val
end
def [](key)
@@superhash[key]
end
def values
@@superhash.values
end
def keys
@@superhash.keys
end
def stop
end
end
# require "pry-byebug"
n = 50000
Benchmark.bm(7) do |x|
x.report("daybreak insert:") { db = DaybreakWrapper.new(); testdatat.each{|v| db[v[0]] = v[1]} ; db.stop}
x.report("gdbm insert:") { db = GDBMWrapper.new() ; testdatat.each{|v| db[v[0]] = v[1]} ; db.stop}
x.report("dbm insert:") { db = DBMWrapper.new() ; testdatat.each{|v| db[v[0]] = v[1]} ; db.stop}
x.report("PStore insert:") { db = PStoreWrapper.new() ; db.transaction do ; testdatat.each{|v| db[v[0]] = v[1]} end ; db.stop}
x.report("hash insert:") { db = HashWrapper.new() ; testdatat.each{|v| db[v[0]] = v[1]} }
x.report("daybreak read: ") { db = DaybreakWrapper.new(); n.times do ; db[testdatat.sample[0]] end ; db.stop}
x.report("gdbm read: ") { db = GDBMWrapper.new() ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
x.report("dbm read: ") { db = DBMWrapper.new() ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
x.report("PStore read: ") { db = PStoreWrapper.new() ; db.transaction do ; n.times do ; db[testdatat.sample[0]] end end; db.stop}
x.report("hash read: ") { db = HashWrapper.new() ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
x.report("daybreak keys: ") { db = DaybreakWrapper.new(); raise "Key error in daybreak" unless db.keys.count == 305_000 ; db.stop}
x.report("gdbm keys: ") { db = GDBMWrapper.new() ; raise "Key error in gdbm" unless db.keys.count == 305_000 ; db.stop}
x.report("dbm keys: ") { db = DBMWrapper.new() ; raise "Key error in dbm" unless db.keys.count == 305_000 ; db.stop}
x.report("PStore keys: ") { db = PStoreWrapper.new() ; raise "Key error in PStore" unless db.keys.count == 305_000 ; db.stop}
x.report("hash keys: ") { db = HashWrapper.new() ; raise "Key error in hash" unless db.keys.count == 305_000 ; db.stop}
x.report("daybreak values:") { db = DaybreakWrapper.new(); raise "Value error in daybreak" unless db.values.count == 305_000 ; db.stop}
x.report("gdbm values:") { db = GDBMWrapper.new() ; raise "Value error in gdbm" unless db.values.count == 305_000 ; db.stop}
x.report("dbm values:") { db = DBMWrapper.new() ; raise "Value error in dbm" unless db.values.count == 305_000 ; db.stop}
x.report("PStore values:") { db = PStoreWrapper.new() ; raise "Value error in PStore" unless db.values.count == 305_000 ; db.stop}
x.report("hash values:") { db = HashWrapper.new() ; raise "Value error in hash" unless db.values.count == 305_000 ; db.stop}
end
def format_mb(size)
conv = [ 'b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb' ];
scale = 1024;
ndx=1
if( size < 2*(scale**ndx) ) then
return "#{(size)} #{conv[ndx-1]}"
end
size=size.to_f
[2,3,4,5,6,7].each do |ndx|
if( size < 2*(scale**ndx) ) then
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
end
ndx=7
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
puts "daybreak file size: #{format_mb(File.size("testDaybreak.db"))}" if File.file?("testDaybreak.db")
puts "gdbm file size: #{format_mb(File.size("testGDBM.db"))}" if File.file?("testGDBM.db")
puts "dbm file size: #{format_mb(File.size("testDBM.db"))}" if File.file?("testDBM.db")
puts "PStore file size: #{format_mb(File.size("testPStore.db"))}" if File.file?("testPStore.db")
delete_files
puts "-------------------------------------------------------------"
##Results
user system total real
daybreak insert: 6.630000 2.940000 9.570000 ( 10.290654)
gdbm insert: 3.510000 1.770000 5.280000 ( 5.904929)
dbm insert: 1.010000 1.560000 2.570000 ( 5.002721)
PStore insert: 1.190000 0.080000 1.270000 ( 5.790313)
hash insert: 0.330000 0.010000 0.340000 ( 0.348286)
daybreak read: 3.250000 0.100000 3.350000 ( 3.688214)
gdbm read: 0.450000 0.170000 0.620000 ( 1.727852)
dbm read: 0.210000 0.330000 0.540000 ( 1.593883)
PStore read: 2.080000 0.080000 2.160000 ( 2.439700)
hash read: 0.070000 0.000000 0.070000 ( 0.072445)
daybreak keys: 3.010000 0.110000 3.120000 ( 3.230572)
gdbm keys: 1.240000 0.250000 1.490000 ( 2.769978)
dbm keys: 0.190000 0.270000 0.460000 ( 1.646055)
PStore keys: 1.710000 0.120000 1.830000 ( 2.078772)
hash keys: 0.010000 0.000000 0.010000 ( 0.003377)
daybreak values: 3.320000 0.050000 3.370000 ( 3.543207)
gdbm values: 0.720000 0.180000 0.900000 ( 2.084107)
dbm values: 0.400000 0.180000 0.580000 ( 1.380364)
PStore values: 1.150000 0.030000 1.180000 ( 1.194967)
hash values: 0.000000 0.000000 0.000000 ( 0.003910)
daybreak file size: 28.627 mb
gdbm file size: 45.885 mb
dbm file size: 41.754 mb
PStore file size: 25.137 mb
-------------------------------------------------------------
user system total real
daybreak insert: 6.900000 3.070000 9.970000 ( 10.570726)
gdbm insert: 3.760000 1.970000 5.730000 ( 6.984838)
dbm insert: 0.990000 1.580000 2.570000 ( 2.864508)
PStore insert: 1.200000 0.100000 1.300000 ( 1.452362)
hash insert: 0.320000 0.000000 0.320000 ( 0.327974)
daybreak read: 3.250000 0.130000 3.380000 ( 3.683599)
gdbm read: 0.470000 0.180000 0.650000 ( 1.988697)
dbm read: 0.130000 0.150000 0.280000 ( 0.502423)
PStore read: 2.090000 0.070000 2.160000 ( 2.447758)
hash read: 0.080000 0.000000 0.080000 ( 0.077678)
daybreak keys: 2.960000 0.120000 3.080000 ( 3.211114)
gdbm keys: 1.260000 0.270000 1.530000 ( 3.299173)
dbm keys: 0.190000 0.270000 0.460000 ( 1.530169)
PStore keys: 1.710000 0.180000 1.890000 ( 1.986568)
hash keys: 0.000000 0.000000 0.000000 ( 0.004191)
daybreak values: 3.250000 0.070000 3.320000 ( 3.778728)
gdbm values: 0.850000 0.220000 1.070000 ( 3.457204)
dbm values: 0.370000 0.190000 0.560000 ( 1.823945)
PStore values: 1.150000 0.030000 1.180000 ( 1.202744)
hash values: 0.010000 0.000000 0.010000 ( 0.008623)
daybreak file size: 28.627 mb
gdbm file size: 45.885 mb
dbm file size: 41.754 mb
PStore file size: 25.137 mb
-------------------------------------------------------------
user system total real
daybreak insert: 6.530000 2.910000 9.440000 ( 9.938166)
gdbm insert: 3.500000 1.770000 5.270000 ( 7.916740)
dbm insert: 0.990000 1.580000 2.570000 ( 3.926296)
PStore insert: 1.280000 0.080000 1.360000 ( 6.565056)
hash insert: 0.320000 0.010000 0.330000 ( 0.339820)
daybreak read: 3.180000 0.130000 3.310000 ( 4.165183)
gdbm read: 0.460000 0.170000 0.630000 ( 1.667406)
dbm read: 0.270000 0.600000 0.870000 ( 3.078985)
PStore read: 2.080000 0.090000 2.170000 ( 2.514186)
hash read: 0.070000 0.000000 0.070000 ( 0.074237)
daybreak keys: 2.910000 0.110000 3.020000 ( 3.136291)
gdbm keys: 1.270000 0.250000 1.520000 ( 2.784880)
dbm keys: 0.190000 0.260000 0.450000 ( 1.463298)
PStore keys: 1.640000 0.100000 1.740000 ( 1.810947)
hash keys: 0.000000 0.000000 0.000000 ( 0.002555)
daybreak values: 3.150000 0.060000 3.210000 ( 3.306449)
gdbm values: 0.750000 0.200000 0.950000 ( 2.202966)
dbm values: 0.380000 0.180000 0.560000 ( 1.565708)
PStore values: 1.140000 0.040000 1.180000 ( 1.253058)
hash values: 0.000000 0.000000 0.000000 ( 0.004755)
daybreak file size: 28.627 mb
gdbm file size: 45.885 mb
dbm file size: 41.754 mb
PStore file size: 25.137 mb
-------------------------------------------------------------
Verdict:
daybreak:
10.290654+3.688214+3.230572+3.543207+10.570726+3.683599+3.211114+3.778728+9.938166+4.165183+3.136291+3.306449 = 62,542903
gdbm:
5.904929+1.727852+2.769978+2.084107+6.984838+1.988697+3.299173+3.457204+7.916740+1.667406+2.784880+2.202966 = 42,78877
dbm:
5.002721+1.593883+1.646055+1.380364+2.864508+0.502423+1.530169+1.823945+3.926296+3.078985+1.463298+1.565708 = 26,378355
PStore:
5.790313+2.439700+2.078772+1.194967+1.452362+2.447758+1.986568+1.202744+6.565056+2.514186+1.810947+1.253058 = 30,736431
hash:
0.348286+0.072445+0.003377+0.003910+0.327974+0.077678+0.004191+0.008623+0.339820+0.074237+0.002555+0.00475 = 1,267846
hash = 1,267846 #Note: Does not persist
dbm = 26,378355 #Note: Stored data is cpu-architecture-dependent. Hard to debug if needed.
PStore = 30,736431 #Note: Requires all in same transaction. if not, time ~= infinity
gdbm = 42,78877
daybreak = 62,542903 #Note: Does not work right on Windows
hash file size: 0 bytes #Do not persist
dbm file size: 41.754 mb
PStore file size: 25.137 mb
gdbm file size: 45.885 mb
daybreak file size: 28.627 mb
As you can see, the dbm
seems to be fastest overall, but has the issue that the stored file is very dependent on the machine you are on, so if you move it to another machine, it might not read at all. Pstore
seems to be performin well, however, if the tests were not run inside one db.transaction
, the performance was so bad I had to abort the execution. One point to make though is that PStore
seems to use fairly litte disk space (smallest in this test).
GDBM
seems to be the best allround solution. It works on all platforms (yes, even Windows) and it seems to be performing well. Keep in mind that it has to marshal.dump
the values and keys used, and even though that is done it performs fairly well.
Daybreak
seems to use fairly little disk space also, but is somewhat slow on this dataset contrary to what the moneta tests seem to indicate, which uses a very small dataset (100 and 1000 keys). Daybreak, despite being "pure ruby" does not work in windows because it uses some fancy file-locking which is only supported in POSIX.
The hash
times are added for a simple comparison to the (to my knowledge) fastest in-memory alternative. As one would expect, the hash is way faster.
Notes: There are multiple key-value stores left out of this test. This test was meant as a test of some cross-platform alternatives compared to the daybreak
which I wrongly assumed to be cross platform. Feel free to add any ones you feel is missing :)
Can we put sqlite3_hash to the test too?