Skip to content

Instantly share code, notes, and snippets.

@linse
Last active March 27, 2019 20:18
Show Gist options
  • Save linse/8e0f79e6b4324382459ca19ce1dac0ee to your computer and use it in GitHub Desktop.
Save linse/8e0f79e6b4324382459ca19ce1dac0ee to your computer and use it in GitHub Desktop.
Scrapism
cat moby_dick.txt | tr -d '\r“’‘”' | tr "\n" " " | tr -s ' ' | less
cat moby_dick.txt | tr -d '\r“’‘”' | tr "\n" " " | tr -s ' ' | sed 's/e/aaaaa/g' | less
# one sentence per line
cat moby_dick.txt | tr -d '\r“’‘”_' | tr "\n" " " | tr "—" " " | tr "-" " " | tr -s ' ' | sed 's/\./.\
/g' | sed 's/\!/!\
/g' | sed 's/\?/?\
/g' | less | sort | uniq -c | sort -n
# one word per line
cat Women-Who-Run-with-the-Wolves.txt | tr -d '\r“’‘”"_' | tr "\n" " " | tr "—" " " | tr "-" " " | tr -s ' ' | sed 's/ / \
/g' | sort | uniq -c | sort -nr | less
# append instead of less to pick some lines at random
sort --random-sort | head -n 5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment