When downloading a new collection on katalyst01 col30381
using neb get
,
Ryan got this error:
karen@yachiyo:~/src/nebuchadnezzar$ neb get katalyst01 col30381 latest
Traceback (most recent call last):
File "./venv/bin/neb", line 11, in <module>
load_entry_point('nebuchadnezzar', 'console_scripts', 'neb')()
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/karen/src/nebuchadnezzar/nebu/cli/_common.py", line 59, in wrapper
return func(*args, **kwargs)
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/karen/src/nebuchadnezzar/nebu/cli/get.py", line 102, in get
version = resp.json()['headVersion']
File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I added some print statements in neb get
so we can see the url and the response:
diff --git a/nebu/cli/get.py b/nebu/cli/get.py
index cdec743..e9c791c 100644
--- a/nebu/cli/get.py
+++ b/nebu/cli/get.py
@@ -95,10 +95,12 @@ def get(ctx, env, col_id, col_version, output_dir, book_tree, get_resources):
# Fetch extras (includes head and downloadable file info)
url = '{}/extras/{}@{}'.format(base_url, uuid, version)
+ print(url)
resp = session.get(url)
# Latest defaults to successfully baked - we need headVersion
if col_version == 'latest':
+ print(resp.text)
version = resp.json()['headVersion']
url = '{}/extras/{}@{}'.format(base_url, uuid, version)
resp = session.get(url)
This is the debug output:
https://archive-katalyst01.cnx.org/extras/a919011d-956e-4db2-8c1a-37ad293bbfc9@1.1
<html>
<head>
<title>404 Not Found</title>
</head>
<body>
<h1>404 Not Found</h1>
The resource could not be found.<br/><br/>
</body>
</html>
Then I had a look in cnx-archive for the view function for this url and found:
add_route('content-extras', '/extras/{ident_hash:([^:/@.]+(@[0-9.]*[0-9]+)?)}{separator:(:?)}{page_ident_hash:([^:/@.]+(@[0-9.]*[0-9]+)?)?}') # noqa cnxarchive.views:get_extra
The route is called content-extras
and you can find the code by doing:
karen@yachiyo:~/src/cnx-archive$ git grep content-extras
...
cnxarchive/views/content.py:@view_config(route_name='content-extras', request_method='GET',
Looking at the source code:
@view_config(route_name='content-extras', request_method='GET',
http_cache=(600, {'public': True}))
def get_extra(request):
"""Return information about a module / collection that cannot be cached."""
settings = get_current_registry().settings
exports_dirs = settings['exports-directories'].split()
args = request.matchdict
if args['page_ident_hash']:
context_id, context_version = split_ident_hash(args['ident_hash'])
try:
id, version = split_ident_hash(args['page_ident_hash'])
except IdentHashShortId as e:
id = get_uuid(e.id)
version = e.version
except IdentHashMissingVersion as e:
# Ideally we would find the page version
# that is in the book instead of latest
id = e.id
version = get_latest_version(e.id)
...
My feeling is that may be get_latest_version
was failing. So next, I ssh
ed
into katalyst01.cnx.org
and had a look at the database:
repository=> select moduleid, version, name, created from modules where portal_type = 'Collection' order by created desc limit 3;
moduleid | version | name | created
----------+---------+--------------------------------------------------------------------------------------------------------+------------------------
col30381 | 1.1 | Physics | 2019-12-13 20:02:00+00
col30328 | 1.1 | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
col30328 | 1.2 | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
(3 rows)
repository=> select moduleid, version, name, created from latest_modules where portal_type = 'Collection' order by created desc limit 3;
moduleid | version | name | created
----------+---------+--------------------------------------------------------------------------------------------------------+------------------------
col30328 | 1.3 | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
col30327 | 1.3 | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:45:44+00
col30326 | 1.8 | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-05 15:23:07+00
(3 rows)
So col30381
is in the modules
table but not in the latest_modules
table.
Looking at the update_latest trigger:
CREATE OR REPLACE FUNCTION update_latest() RETURNS trigger AS '
BEGIN
-- lastest content is the highest version that has successfully baked - states 1 and 8 (current and fallback)
-- represent some sort of success (fallback used an old recipe due to errors)
IF (TG_OP = ''INSERT'' OR TG_OP = ''UPDATE'') AND
ARRAY [NEW.major_version, NEW.minor_version] >= (SELECT ARRAY [major_version, minor_version]
FROM latest_modules WHERE uuid = NEW.uuid UNION ALL SELECT ARRAY[0, NULL] LIMIT 1) AND
NEW.stateid in (1, 8) THEN -- current and fallback
LOCK TABLE latest_modules IN SHARE ROW EXCLUSIVE MODE;
...
latest_modules
is not updated if stateid
isn't 1
or 8
and looking at the stateid:
repository=> select moduleid, version, stateid from modules where portal_type = 'Collection' order by created desc limit 3;
moduleid | version | stateid
----------+---------+---------
col30381 | 1.1 | 6
col30381 | 1.1 | 6
col30328 | 1.1 | 1
(3 rows)
That was exactly the problem. stateid
6 means baking failed.
Looking at the GOB page for physics, we get this traceback:
Traceback (most recent call last):
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/celery/app/trace.py", line 385, in trace_task
R = retval = fun(*args, **kwargs)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/tasks.py", line 31, in __call__
return super(PyramidAwareTask, self).__call__(*args, **kwargs)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/celery/app/trace.py", line 648, in __protected_call__
return self.run(*args, **kwargs)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/db.py", line 74, in wrapped
return func(*args, **kwargs)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/subscribers.py", line 168, in baking_processor
bake(binder, recipe_id, publisher, message, cursor=cursor)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/db.py", line 70, in wrapped
return func(*args, **kwargs)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/bake.py", line 74, in bake
binder = collate_models(binder, ruleset=recipe, includes=includes)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/collation.py", line 71, in collate
collated_binder = reconstitute(collated_html)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/collation.py", line 51, in reconstitute
return adapt_single_html(xhtml)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 360, in adapt_single_html
_adapt_single_html_tree(binder, body[0], nav_tree, top_metadata=metadata)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 517, in _adapt_single_html_tree
id_map=id_map, depth=depth+1)
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 475, in _adapt_single_html_tree
child.xpath('./*[@data-type="metadata"]')[0])
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 74, in parse_metadata
return parser()
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 108, in __call__
return self.metadata
File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 128, in metadata
"A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.
Download the recipe used (you can find the resource hash on the GOB page):
wget -O katalyst01-physics-recipe.css 'https://katalyst01.cnx.org/resources/a1f636ac26f5bdf29a78bade3f9f499ee20e0212'
Download and bake the book (after commenting out the
bit
that neb get
tries to get the head version):
mkdir -p physics/raw
neb get -d physics/raw katalyst01 col30381 1.1
neb assemble physics/raw physics # this creates physics/collection.assembled.xhtml
cnx-easybake katalyst01-physics-recipe.css physics/collection.assembled.xhtml physics/collection.baked.xhtml
Looking at the traceback, the error happens when running adapt_single_html
:
karen@yachiyo:~/src/cnx-epub$ python
Python 2.7.15+ (default, Oct 7 2019, 17:39:04)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.getLogger('cnxepub').addHandler(logging.StreamHandler())
>>> from cnxepub import adapt_single_html
>>> adapt_single_html(open('physics/collection.baked.xhtml').read())
Error when parsing metadata for composite-page (id: None, parent: "1 What is Physics?")
Traceback (most recent call last):
File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 476, in _adapt_single_html_tree
child.xpath('./*[@data-type="metadata"]')[0])
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 74, in parse_metadata
return parser()
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 108, in __call__
return self.metadata
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 128, in metadata
"A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.
No handlers could be found for logger "cnxepub"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 360, in adapt_single_html
_adapt_single_html_tree(binder, body[0], nav_tree, top_metadata=metadata)
File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 529, in _adapt_single_html_tree
id_map=id_map, depth=depth+1)
File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 476, in _adapt_single_html_tree
child.xpath('./*[@data-type="metadata"]')[0])
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 74, in parse_metadata
return parser()
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 108, in __call__
return self.metadata
File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 128, in metadata
"A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.
Tada! We have reproduced the error seen on the server.
Specifically it's the composite page after "1.3 The Language of Physics: Physical Quantities and Units"
<div class="os-eoc os-ost-chapter-review.problem-container" data-type="composite-page"><div data-type="metadata" style="display: none;">
<h1 data-type="document-title" itemprop="name"></h1>
Since this is a composite page, it's probably a problem with the recipe and is not somthing Ryan can fix by changing the content.
I tried baking the same book with the physics recipe and that worked, so Ryan will see if she can change the recipe and publish for now.