Skip to content

Instantly share code, notes, and snippets.

@habernal
Created May 20, 2016 13:46
Show Gist options
  • Save habernal/268828961865d9264cb8e2bee6098d91 to your computer and use it in GitHub Desktop.
Save habernal/268828961865d9264cb8e2bee6098d91 to your computer and use it in GitHub Desktop.
Convert Reuters-21578 from SGML to XML
required packages
opensp
xmllint
$ reuters21578/orig$ for i in *.sgm ; do osx $i | tr -dc '\11\12\15\40-\176' > temp.xml ; xmllint -format temp.xml > "$i".xml ; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment