Skip to content

Instantly share code, notes, and snippets.

@erikdw
Last active October 3, 2018 21:04
Show Gist options
  • Save erikdw/ba55fecff36f290b869b251a60fbca07 to your computer and use it in GitHub Desktop.
Save erikdw/ba55fecff36f290b869b251a60fbca07 to your computer and use it in GitHub Desktop.
Creation of Groupon Farewell Word-Cloud as G logo.

Download all "company-departure" emails with got-your-back:

./gyb --email MYGMAILACCOUNT@gmail.com --search 'label:company-departures'

Join all .eml file bodies into a single file

Build this program based on a stackoverflow answer:

#!/usr/bin/env python

import os
import email
from emaildata.text import Text

emls = []

for dirpath, dirnames, filenames in os.walk("."):
    for filename in [f for f in filenames if f.endswith(".eml")]:
        emls.append(os.path.join(dirpath, filename))

for eml in emls:

    message = email.message_from_file(open(eml))
    text = Text.text(message)

    print("email: %s has body:\n" % eml)
    print("-----------------------")
    print(text)
    print("-----------------------")

Run the program.

./get-email-text.py > departure-email-text.txt

Waste a bunch of time skimming through emails & fixing them up

  1. removing personal info (emails, etc.)
  2. Get rid of annoying '^M' characters:
sed -i -e "s/^M//"  departure-email-text.2nd-try.txt 

NOTE: Need to type '^M' in a special manner:

  • To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.
  1. This effort did reveal that I'd want to do the next step of having phrases in the word-cloud.

Reformat various phrases into joined-words for these Word-Cloud tools

e.g., "thank you", "keep in touch", etc. -- you can join the words with a '~' and most of these Word-Cloud tools will treat them as a single word and remove the '~'.

Can be done with sed, vim, etc.

Generate word-frequency

https://tagcrowd.com/

  • Exclude: com email hi https linkedin lot
  • Group similar words? Yes
  • Show frequencies: yes

Format word frequencies from tagcrowd.com into CSV

# copy tagcrowd edit-box contents with frequencies into /tmp/1
cat /tmp/1 | tr ')' '\n' > /tmp/2
cat /tmp/2 | sed 's/^ //g' | sed 's/(//g' > /tmp/3
cat /tmp/3 | sort -rn -k2 | tr ' ' ';' > word-cloud-seeds.downcase.csv

Find image of Groupon G logo

alt text

Generate word cloud image via wordart.com

  1. Upload G mask image
  2. Choose colors based on Groupon color pallete (found an image in Skynet that had the hex codes)
  3. Copy the contents of word-cloud-seeds.downcase.csv into the words list.
  4. Generate image.

Result

alt text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment