Elastic Data Modelling

With this data model, you can scroll through each interaction/source/channel demographic index directly for each demographic bucket + combination of buckets you want, without having to run through the overhead of traversing a large graph on every run.

INDEX `/consumer-demo`

Create an index for all consumers with all their relevant demographic attributes. Whatever process(es) that determine these values will update this index

{
  "modelId": "icx-id",
  "consumerId": "consumer-1",
  "dimensionA": "value-a-1",
  "dimensionB": "value-b-1",
  "dimensionC": "value-c-1"
}

INDEX `/interaction-demo`

You can subscribe to streaming interaction documents. As new interaction come in, check if their consumer (1:1) has an entry in the consumer-demo index, (alternatively, you can setup up a percolator query to do this) Post the interaction + demographic data to its own index.

{
  "modelId": "icx-id",
  "interactionId": "int-1",
  "consumerId": "consumer-1",
  "sourceId": "source-1",
  "content": "...",
  "type": "comment",
  "dimensionA": "value-a-1",
  "dimensionB": "value-b-1",
  "dimensionC": "value-c-1"
}

INDEX `/metric-demo`

Each interaction also refers to exactly 1 source, so as your streaming interactions, you can use that source-id to create/update an index that tracks which dimensions/dimension-buckets have interacted with that source. (You can include any time period data that the values are valid for in the document as well, depending on the use case) You can store each dimension value as its own field, or if you want to track each unique combination you can concatenate:

{
  "modelId": "icx-id",
  "sourceId": "source-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [
    "a-1-b-1-c-1",
    "a-2-b-1-c-1",
    "a-2-b-2-c-1"
  ]
}

{
  "modelId": "icx-id",
  "channelId": "channel-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [
    "a-1-b-1-c-1",
    "a-2-b-1-c-1",
    "a-2-b-2-c-1"
  ]
}

Additionally if you want to have any other stastistics about the demographic buckets (ie percentage), you can use nested maps:

{
  "modelId": "icx-id",
  "sourceId": "source-1",
  "channelId": "channel-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [{
      "dimensionA": "value-a-1",
      "dimensionB": "value-b-1",
      "dimensionC": "value-c-1",
      "percentage": 0.25
    },
    {
      "dimensionA": "value-a-2",
      "dimensionB": "value-b-1",
      "dimensionC": "value-c-1",
      "percentage": 0.5
    },
    {
      "dimensionA": "value-a-2",
      "dimensionB": "value-b-2",
      "dimensionC": "value-c-1",
      "percentage": 0.25
    }
  ]
}

abarnash/es-data-model.md

INDEX /consumer-demo

INDEX /interaction-demo

INDEX /metric-demo

INDEX `/consumer-demo`

INDEX `/interaction-demo`

INDEX `/metric-demo`