Skip to content

Instantly share code, notes, and snippets.

@amoose
Created August 4, 2014 14:58
Show Gist options
  • Save amoose/b58b594b2ec215693d7b to your computer and use it in GitHub Desktop.
Save amoose/b58b594b2ec215693d7b to your computer and use it in GitHub Desktop.
ElasticSearch synonym tokenization with Searchkick gem
require 'searchkick'
module SearchkickSyn
def searchkick_index_options
# fetch index options
options = super
# inject Synonym filter for default index and searchkick search
options[:settings][:analysis][:analyzer][:default_index][:filter].push "synonym"
options[:settings][:analysis][:analyzer][:searchkick_search][:filter].push "synonym"
# inject WordNet synonym filter
options[:settings][:analysis][:filter][:synonym] = {
:type => 'synonym',
:synonyms_path => '/var/lib/wn_s.pl'
}
options
end
end
# This is the 'footprint' definition (including all containing
# modules, etc.) of the class we want to 'prepend' with our module
module Searchkick
module Reindex
# This is the line where all the magic happens - we 'prepend'
# the module we created above into the class
prepend SearchkickSyn
end
end
@amoose
Copy link
Author

amoose commented Aug 4, 2014

The Prolog-formatted synset file (wn_s.pl) must exist on the server, as specified on line 15. The file must be identical on all ElasticSearch servers.

The WordNet 3.1 database files can be downloaded from here: http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz

@amoose
Copy link
Author

amoose commented Aug 5, 2014

This only works on Ruby 2+

@edemkumodzi
Copy link

Is it possible to put the wordnet file inside the rails project itself? Not sure Heroku gives access to the var/lib folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment