Skip to content

Instantly share code, notes, and snippets.

@eph2795
Created May 29, 2020 20:55
Show Gist options
  • Save eph2795/e346d57f4e10922662d37d965065bbe0 to your computer and use it in GitHub Desktop.
Save eph2795/e346d57f4e10922662d37d965065bbe0 to your computer and use it in GitHub Desktop.
from fastText import load_model
model = load_model('official_fasttext_wiki_200_model')
def find_ngrams(string, n):
ngrams = zip(*[string[i:] for i in range(n)])
ngrams = [''.join(_) for _ in ngrams]
return ngrams
string = 'грёзоблаженствующий'
ngrams = []
for i in range(3,7):
ngrams.extend(find_ngrams('<'+string+'>',i))
ft_ngrams, ft_indexes = model.get_subwords(string)
ngrams = set(ngrams)
ft_ngrams = set(ft_ngrams)
print(sorted(ngrams),sorted(ft_ngrams))
print(set(ft_ngrams).difference(set(ngrams)),set(ngrams).difference(set(ft_ngrams)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment