Skip to content

Instantly share code, notes, and snippets.

@guilhermefgs
Last active October 11, 2020 19:40
Show Gist options
  • Save guilhermefgs/4621875715b4165f950f3196e4271a17 to your computer and use it in GitHub Desktop.
Save guilhermefgs/4621875715b4165f950f3196e4271a17 to your computer and use it in GitHub Desktop.
import nltk
nltk.download('machado')
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
nltk.download('punkt')
# corpus dom casmurro
corpus_dom_casmurro = nltk.corpus.machado.raw('romance/marm08.txt')
# pre processamento
texto = pre_processamento(corpus_dom_casmurro)
# tokenizando
tokens = word_tokenize(texto)
# contagem de frequencia
fd = FreqDist(tokens)
print("20 palavras mais frequentes:")
print(fd.most_common(20))
# plot
import matplotlib.pyplot as plt
plt.figure(figsize = (13, 8))
fd.plot(30, title = "Frequência de Palavras")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment