Skip to content

Instantly share code, notes, and snippets.

@Vesihiisi
Created November 1, 2021 09:50
Show Gist options
  • Save Vesihiisi/f04b2a838a99b10bdabe94c08a43cb4f to your computer and use it in GitHub Desktop.
Save Vesihiisi/f04b2a838a99b10bdabe94c08a43cb4f to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
"""
Given a quarry.wmcloud.org URL,
process the info from WMSE's SDC upload
project to extract 1) the number of unique
files edited, 2) the number of SDC statements
added.
Dump has to be in this format to be examined:
https://quarry.wmcloud.org/query/59376
Usage:
python3 get_quarry_stats.py url https://quarry.wmcloud.org/query/59376
"""
import argparse
import json
import requests
def main(args):
url = args.get("url")
processed_files = []
statement_counter = 0
json_url = url[1] +"/result/latest/0/json"
quarry_data = requests.get(json_url)
quarry_json = quarry_data.json()
quarry_results = quarry_json.get("rows")
for row in quarry_results:
processed_files.append(row[0])
statement_counter = statement_counter + row[1].count("[[:d:Property:")
print("Unique files")
print(len(set(processed_files)))
print("Added statements")
print(statement_counter)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("url",nargs="*")
args = parser.parse_args()
main(vars(args))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment