HAMT RDF::Repository Thread Safety

Threading

While our new RDF::Repository implementation theoretically improves the concurrency story for RDF.rb, it isn't, in itself, thread safe. The underlying data representation may be purely functional, but the Repository itself is swimming in shared mutable state. Specifically, we have the potential for a data race during execution of code like @data = data; and, more generally, for race conditions wherever our changes depend on previous reads. Notably, this affects #transaction, as demonstrated in the following snippet:

require 'rdf'
repo = RDF::Repository.new

threads = []
err_count = 0

# make 10 threads, processing 1000 transactions each
10.times do |n|
  threads << Thread.new do
    1_000.times do |i|
      begin
        repo.transaction(mutable: true) do
          # insert a unique statement for each transaction
          insert RDF::Statement("thread_#{n}".to_sym,
                                RDF::URI('http://example.com/num'),
                                i)
        end
      rescue RDF::Transaction::TransactionError
        # count up the statements that fail in execution
        err_count += 1
      end
    end
  end
end

threads.each(&:join)

# not even close to 10_000!
repo.count + err_count # => 5587

(Running this in your environment is may yield different results. You may even see expected results. Nevertheless, trust me, this code is not safe.)

The good news that races are reasonably isolated. Any dreams of perfectly asynchonous concurrency are dashed, but the need for synchonization is minimized. For transactions, we need only synchonize #execute; in place of the transaction block above, we have:

# ...
begin
  tx = repo.transaction(mutable: true)
  tx.insert RDF::Statement("thread_#{n}".to_sym,
                           RDF::URI('http://example.com/num'),
                           i)
  mutex.synchronize { tx.execute }
rescue RDF::Transaction::TransactionError
# ...

Still, as an implementation-specific solution, this leaves something to be desired. Giving more thought to thread safety will likely uncover better options.

no-reply/hamt_repo_threading.markdown

Threading

no-reply commented Jun 4, 2016