Skip to content

Instantly share code, notes, and snippets.

@umbreak
Last active May 7, 2020 09:58
Show Gist options
  • Save umbreak/00f38cff9ae2d50bb27ba5f075bc506d to your computer and use it in GitHub Desktop.
Save umbreak/00f38cff9ae2d50bb27ba5f075bc506d to your computer and use it in GitHub Desktop.
Blue Brain Graph POC

Blue Brain Graph POC

A service which will do the following:

Ingestion

  1. Listen to Nexus SSE of type {Paper} on project {literature}.
  2. Split each paper into sentences.
  3. Make an API call to Blue Brain Search to get the dense vector for each sentence.
  4. For each sentence, insert a document in ES index (paper_sentences):
{
	"paperId": "{id}",
	"text": "{original text}",
	"vector": [...]
}
  1. For each paper, insert a document in ES index (papers) as raw document

Search

  1. Provide an API endpoint /v1/papers:
POST /v1/papers

{
	"paperId": ["{id1}", ..., "{idN}"], //optional
	"text": "{text}"
}
  1. Make an API call to Blue Brain Search to get the dense vector for the {text}
  2. Make an ElasticSearch query using cosinesimilarity (for now)
{
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "filter": {
            "query": {
              "bool": {
                "should": [
                  "term": { "paperId": "{id1}" },
                  "term": { "paperId": "{id2}" }
                ]
              }
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, 'vector') + 1.0",
        "params": {
          "query_vector": {TEXT_VECTOR}
        }
      }
    }
  },
  "_source": {"excludes": ["vector"]}
}
  1. Collect the returned score sentence from each paper

    1. Make an API call to Nexus to retrieve the paper metadata (authors, title, ...).
    2. Alternatively make a second elasticsearch query to retrieve the matched papers (probably this can be optimazied in some way).
  2. Compose and serve the API response:

{
  "total": {total},
  "results": [
    {
      "score": {score},
      "paperId": "{id}",
      "title": "{title}",
      "authors": [ "{name1}", "{nameN}" ],
      "text": [
         { "highlight": true, "value": "{text1}" },
         ...
         { "highlight": false, "value": "{textN}" },
      ]
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment