I just tried my example from the tinycdxserver README and realised that curl is messing up the line-endings due to some conversion it does by default. I haven't checked yet exactly what curl is doing but tinycdxserver is interpreting it as if all the lines in the file have been concatenated together (you can see that by running tinycdxserver in verbose mode with the -v option).
Using curl's --data-binary option instead of --data fixes that and I've updated the README correspondingly.
That could be what's tripping you up. Here's a more complete example that I just tested. You should get an "Added N records" response back if it worked properly, where N is the line count of the cdx.
records.cdx below has a blank ("-") first column because tinycdxserver ignores it and does its own canonicalisation so our usual indexing process doesn't even bother filling it in. You can use standard CDX files as well, I've included a second file records2.cdx with SURT-style URLs that was generated using IA tools just to demonstrate that.
Compile tinycdxserver:
$ git clone git@github.com:nla/tinycdxserver.git
$ cd tinycdxserver
$ mvn package
Start tinycdxserver:
$ mkdir /tmp/data
$ java -jar target/tinycdxserver-0.1-SNAPSHOT.jar -d /tmp/data
Grab an example CDX:
$ curl -LO https://gist.github.com/ato/b2ad8e65b35afe690921/raw/4e663c44c74c585ac0d5226780465d2281177958/records.cdx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 1203 100 1203 0 0 1297 0 --:--:-- --:--:-- --:--:-- 1297
Load it:
$ curl -XPOST --data-binary @records.cdx http://localhost:8080/myindex
Added 6 records
Get a record back:
$ curl -s http://localhost:8080/myindex?url=http://minister.infrastructure.gov.au/
au,gov,infrastructure,minister)/ 20150914222035 http://www.minister.infrastructure.gov.au/ text/html 301 ZH3ZBTFT5T6VC4BHO3MC6MLFECBEKDYN 389
Query using wayback's xml protocol:
$ curl -s http://localhost:8080/myindex?q=type:urlquery+url:http://minister.infrastructure.gov.au/ | xml_pp
<?xml version="1.0" encoding="UTF-8"?>
<wayback>
<request>
<startdate>19960101000000</startdate>
<enddate>20151015072406</enddate>
<type>urlquery</type>
<firstreturned>0</firstreturned>
<url>au,gov,infrastructure,minister)/</url>
<resultsrequested>10000</resultsrequested>
<resultstype>resultstypecapture</resultstype>
</request>
<results>
<result>
<compressedoffset>152443</compressedoffset>
<mimetype>text/html</mimetype>
<file>WEB-20150914222031256-00000-43190~heritrix.nla.gov.au~8443.warc.gz</file>
<redirecturl>http://minister.infrastructure.gov.au/</redirecturl>
<urlkey>au,gov,infrastructure,minister)/</urlkey>
<digest>ZH3ZBTFT5T6VC4BHO3MC6MLFECBEKDYN</digest>
<httpresponsecode>301</httpresponsecode>
<robotflags>-</robotflags>
<url>http://www.minister.infrastructure.gov.au/</url>
<capturedate>20150914222035</capturedate>
</result>
</results>
</wayback>
I thought the Wayback XML Query API used separate query parameters, i.e.
Does Wayback need any special configuration to use your CDX server as part of a remote collection?
Oh, hey, that works as well! http://www.webarchive.org.uk/wayback/archive/xmlquery.jsp?q=url:http://www.bl.uk/
So, the RemoteResourceIndex uses the
q=url:
form? Ah, so it looks like this is the OpenSearch API and that that is the required form. Excellent, thanks again.