note to self, is there a command line tool on homebrew to use instead?
copied from: https://unix.stackexchange.com/a/174421/188491 which was copied from https://tomayko.com/blog/2011/awkward-ruby 😅
find . -type f | grep -v git | grep -v mov | xargs cat | tr -c '[:alnum:]' '[\n*]' | sort | uniq -ci | sort -n
function wordfrequency() {
awk '
BEGIN { FS="[^a-zA-Z]+" } {
for (i=1; i<=NF; i++) {
word = tolower($i)
words[word]++
}
}
END {
for (w in words)
printf("%3d %s\n", words[w], w)
} ' | sort -rn
}
cat file1 file2 file 3 | wordfrequency | grep -vE "to|a|in|git|for|help|ui|the" | head -10
use all files and subfiles
find . -type f | grep -v git | xargs cat | wordfrequency | grep -vE "to|a|in|git|for|help|ui|the" | head -10
How to find bad files: (i.e. files with spaces.) find . -type f | grep -vE "git|mov" | grep " ";