Created
May 20, 2016 13:46
-
-
Save habernal/268828961865d9264cb8e2bee6098d91 to your computer and use it in GitHub Desktop.
Convert Reuters-21578 from SGML to XML
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
required packages | |
opensp | |
xmllint | |
$ reuters21578/orig$ for i in *.sgm ; do osx $i | tr -dc '\11\12\15\40-\176' > temp.xml ; xmllint -format temp.xml > "$i".xml ; done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment