Skip to content

Instantly share code, notes, and snippets.

@adammcmaster
Created November 2, 2017 13:57
Show Gist options
  • Save adammcmaster/3647e83f85c60dd9855531774dd1fcdd to your computer and use it in GitHub Desktop.
Save adammcmaster/3647e83f85c60dd9855531774dd1fcdd to your computer and use it in GitHub Desktop.
Generate a CSV file showing how many lines of transcription each user has submitted
import csv
import json
import sys
csv.field_size_limit(sys.maxsize)
line_count = {}
with open('shakespeares-world-classifications.csv') as f:
r = csv.DictReader(f)
for row in r:
if row['user_name'].startswith('not-logged-in-'):
continue
line_count.setdefault(row['user_name'], 0)
for annotation in json.loads(row['annotations']):
if annotation.get('task') != 'T2':
continue
line_count[row['user_name']] += len(annotation.get('value', []))
with open('shakespeares-world-line-counts-per-user.csv', 'w') as f:
w = csv.writer(f)
w.writerow(('username', 'total lines transcribed'))
for user, count in line_count.items():
w.writerow((user, count))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment