To re-run the Hansard parser to attempt to pick up old files which may have been skipped previously:
[if in the UK, change your DNS servers to 8.8.8.8 and 8.8.4.4 (i.e. Google) to get a more stable connection]
Delete old sources from the database which refer to the old parliament.go.ke site:
# check to see whether there are any and how many
SELECT COUNT(*) FROM hansard_source WHERE last_processing_success IS NULL AND url LIKE '%/plone/%';
# delete matching sittings
DELETE FROM hansard_sitting WHERE source_id IN (SELECT id FROM hansard_source WHERE last_processing_success IS NULL AND url LIKE '%/plone/%');
# delete the old sources
DELETE FROM hansard_source WHERE last_processing_success IS NULL AND url LIKE '%/plone/%';
Still in the database, set the last_processing_attempt flag to NULL where there was no recorded success:
UPDATE hansard_source SET last_processing_attempt = NULL WHERE last_processing_success IS NULL;
Rerun the management command for fetching sources:
./manage.py hansard_check_for_new_sources --check-all -v 2
Run the management commmand to scrape the source PDFs:
./manage.py hansard_process_sources -v 2
Expect it to choke looking for a document named 'Hansard 30.05.06' which appears to have been originally published on the pre-2013 election version of the Parliament website as per http://web.archive.org/web/20111130004121/http://www.parliament.go.ke/index.php?option=com_content&view=article&id=202&Itemid=165