Skip to content

Instantly share code, notes, and snippets.

@dayyass
Last active September 29, 2021 12:25
Show Gist options
  • Save dayyass/6facec923472ebba4530e238cadb4366 to your computer and use it in GitHub Desktop.
Save dayyass/6facec923472ebba4530e238cadb4366 to your computer and use it in GitHub Desktop.
Extract token2idf mapper from TfidfVectorizer.
from sklearn.feature_extraction.text import TfidfVectorizer
# data
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
# fit
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit(corpus)
# token2idf
token2idf = {token: tfidf_vectorizer.idf_[idx] for token, idx in tfidf_vectorizer.vocabulary_.items()}
# sorted token2idf
sorted_token2idf = sorted(
token2idf.items(),
key=lambda x: x[1],
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment