Skip to content

Instantly share code, notes, and snippets.

@piyush01123
Last active March 31, 2021 03:32
Show Gist options
  • Save piyush01123/12cc6a24ee8bcf2f56937abaf257dd35 to your computer and use it in GitHub Desktop.
Save piyush01123/12cc6a24ee8bcf2f56937abaf257dd35 to your computer and use it in GitHub Desktop.
TF-IDF implementation
from sklearn.feature_extraction.text import TfidfTransformer
import numpy as np
counts = [[3, 0, 1],
[2, 0, 0],
[3, 0, 0],
[4, 0, 0],
[3, 2, 0],
[3, 0, 2]]
counts = np.array(counts)
temp = counts*(np.log(len(counts) / (counts>0).sum(0))+1).reshape(1,-1)
tfidf = temp/np.linalg.norm(temp,axis=1).reshape(-1,1)
transformer = TfidfTransformer(smooth_idf=False)
tfidf_skl = transformer.fit_transform(counts)
tfidf_skl = tfidf_skl.toarray()
print(np.allclose(tfidf, tfidf_skl))
temp = counts*(np.log((len(counts)+1) / ((counts>0).sum(0)+1))+1).reshape(1,-1)
tfidf_smooth = temp/np.linalg.norm(temp,axis=1).reshape(-1,1)
transformer = TfidfTransformer()
tfidf_skl_smooth = transformer.fit_transform(counts)
tfidf_skl_smooth = tfidf_skl_smooth.toarray()
print(np.allclose(tfidf_smooth, tfidf_skl_smooth))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment