Skip to content

Instantly share code, notes, and snippets.

@johnidm
Created November 1, 2023 13:30
Show Gist options
  • Save johnidm/d926d01672047242f9f8db3c00312891 to your computer and use it in GitHub Desktop.
Save johnidm/d926d01672047242f9f8db3c00312891 to your computer and use it in GitHub Desktop.
Generate word cloud from single-column Pandas dataframe

Word cloud is an excellent visualization by which to highlight key words in a text.

In the text classification task, you can use this to see the most common words that appear in the entire text.

We have a pandas dataframe with text collumn our goal is to see the most frequent words.

Install dependencies

pip install wordcloud -q
pip install pandas -q
pip install matplotlib -q

Create the dataset

import pandas as pd


data = [
  "The book was on the table",
  "We camped by the brook",
  "He knew it was over the rainbow",
  "She was lost in the dark of night",
  "He was between a rock and a hard place",
  "I waited for a while",
  "She smelled of strawberries and cream",
  "He won the challenge against all odds",
]
  
df = pd.DataFrame(
    data,
    columns=['text']                      
)

df.head()

Word cloud plot

from wordcloud import WordCloud
import matplotlib.pyplot as plt


words = df.text.str.cat(sep=' ')

wc = WordCloud(width=400, height=330, max_words=150, colormap="Dark2").generate(words)
plt.figure(figsize=(10,8))
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment