Skip to content

Instantly share code, notes, and snippets.

@Witiko
Created July 28, 2020 13:40
Show Gist options
  • Save Witiko/fa3aed268e5409f8468a6da8e928a481 to your computer and use it in GitHub Desktop.
Save Witiko/fa3aed268e5409f8468a6da8e928a481 to your computer and use it in GitHub Desktop.
#!/bin/sh
# Produces mean amount of financial support by extracting project codes from a PDF document and querying starfos.tacr.cz.
#
# Usage: ./get-mean-tacr-support.sh FILE, where FILE is a PDF document with a table of supported projects, such as
# https://www.tacr.cz/wp-content/uploads/documents/2019/10/29/1572358378_Vyhlaseni_vysledku_eTA_na_web_-_podporene.pdf
set -e
pdfgrep TL[0-9]+ "$1" |
sed -r 's/.*\s(TL[0-9]+)(\s.*|$)/\1/' |
while read CODE
do
curl -s 'https://starfos.tacr.cz/api/starfos/isvav_project/?code='$CODE'&language=en' |
python3 -c '
import sys
import json
import re
data = json.load(sys.stdin)
if "tabs" not in data:
print(data, file=sys.stderr)
else:
tabs = data["tabs"]
tab = tabs[0]
assert tab["tab_id"] == "project-main"
group = tab["tab_content"]["table_card"][-1]
assert group["group_heading"] == "Finance"
field = group["group_fields"][1]
assert field["label"] == "Public financial support"
value = field["value"]
regex = "([0-9,]*) thou. CZK"
match = re.match(regex, value)
assert match, value
value = int(re.sub(",", "", match.groups()[0])) * 1000
print(value)
'
sleep 1
done | python3 -c '
from statistics import mean
import sys
values = (float(line.strip()) for line in sys.stdin)
print(mean(values))
'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment