Skip to content

Instantly share code, notes, and snippets.

@shwars
Created April 29, 2020 10:30
Show Gist options
  • Save shwars/59fbc18db455ef82dc588622177eb59a to your computer and use it in GitHub Desktop.
Save shwars/59fbc18db455ef82dc588622177eb59a to your computer and use it in GitHub Desktop.
blog-deeppavlov-covid-1
from os.path import basename
def get_text(s):
return ' '.join([x['text'] for x in s])
os.makedirs('text',exist_ok=True)
for fn in glob.glob('noncomm_use_subset/pdf_json/*'):
with open(fn) as f:
x = json.load(f)
nfn = os.path.join('text',basename(fn).replace('.json','.txt'))
with open(nfn,'w') as f:
f.write(get_text(x['abstract']))
f.write(get_text(x['body_text']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment