Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save addywaddy/b68e9d2c27008f4e03853750a7f26508 to your computer and use it in GitHub Desktop.
Save addywaddy/b68e9d2c27008f4e03853750a7f26508 to your computer and use it in GitHub Desktop.
Extract plain text from MS Word docx files
unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'
@addywaddy
Copy link
Author

I needed this to diff two word files. I first saved them as text using the above commands and then:

$ wdiff doc1.txt doc2.txt | colordiff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment