A service which will do the following:
- Listen to Nexus SSE of type {Paper} on project {literature}.
- Split each paper into sentences.
- Make an API call to Blue Brain Search to get the dense vector for each sentence.
- For each sentence, insert a document in ES index (paper_sentences):
{
"paperId": "{id}",
"text": "{original text}",
"vector": [...]
}
- For each paper, insert a document in ES index (papers) as raw document
- Provide an API endpoint /v1/papers:
POST /v1/papers
{
"paperId": ["{id1}", ..., "{idN}"], //optional
"text": "{text}"
}
- Make an API call to Blue Brain Search to get the dense vector for the {text}
- Make an ElasticSearch query using cosinesimilarity (for now)
{
"query": {
"script_score": {
"query": {
"bool": {
"filter": {
"query": {
"bool": {
"should": [
"term": { "paperId": "{id1}" },
"term": { "paperId": "{id2}" }
]
}
}
}
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'vector') + 1.0",
"params": {
"query_vector": {TEXT_VECTOR}
}
}
}
},
"_source": {"excludes": ["vector"]}
}
-
Collect the returned score sentence from each paper
- Make an API call to Nexus to retrieve the paper metadata (authors, title, ...).
- Alternatively make a second elasticsearch query to retrieve the matched papers (probably this can be optimazied in some way).
-
Compose and serve the API response:
{
"total": {total},
"results": [
{
"score": {score},
"paperId": "{id}",
"title": "{title}",
"authors": [ "{name1}", "{nameN}" ],
"text": [
{ "highlight": true, "value": "{text1}" },
...
{ "highlight": false, "value": "{textN}" },
]
}
]
}