Skip to content

Instantly share code, notes, and snippets.

@abarnash
Last active February 14, 2019 19:18
Show Gist options
  • Save abarnash/75763123eff544bf8826d502468f9840 to your computer and use it in GitHub Desktop.
Save abarnash/75763123eff544bf8826d502468f9840 to your computer and use it in GitHub Desktop.
Elastic Data Modelling

With this data model, you can scroll through each interaction/source/channel demographic index directly for each demographic bucket + combination of buckets you want, without having to run through the overhead of traversing a large graph on every run.

INDEX /consumer-demo

Create an index for all consumers with all their relevant demographic attributes. Whatever process(es) that determine these values will update this index

{
  "modelId": "icx-id",
  "consumerId": "consumer-1",
  "dimensionA": "value-a-1",
  "dimensionB": "value-b-1",
  "dimensionC": "value-c-1"
}

INDEX /interaction-demo

You can subscribe to streaming interaction documents. As new interaction come in, check if their consumer (1:1) has an entry in the consumer-demo index, (alternatively, you can setup up a percolator query to do this) Post the interaction + demographic data to its own index.

{
  "modelId": "icx-id",
  "interactionId": "int-1",
  "consumerId": "consumer-1",
  "sourceId": "source-1",
  "content": "...",
  "type": "comment",
  "dimensionA": "value-a-1",
  "dimensionB": "value-b-1",
  "dimensionC": "value-c-1"
}

INDEX /metric-demo

Each interaction also refers to exactly 1 source, so as your streaming interactions, you can use that source-id to create/update an index that tracks which dimensions/dimension-buckets have interacted with that source. (You can include any time period data that the values are valid for in the document as well, depending on the use case) You can store each dimension value as its own field, or if you want to track each unique combination you can concatenate:

{
  "modelId": "icx-id",
  "sourceId": "source-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [
    "a-1-b-1-c-1",
    "a-2-b-1-c-1",
    "a-2-b-2-c-1"
  ]
}

{
  "modelId": "icx-id",
  "channelId": "channel-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [
    "a-1-b-1-c-1",
    "a-2-b-1-c-1",
    "a-2-b-2-c-1"
  ]
}

Additionally if you want to have any other stastistics about the demographic buckets (ie percentage), you can use nested maps:

{
  "modelId": "icx-id",
  "sourceId": "source-1",
  "channelId": "channel-1",
  "period":"week",
  "start":"2018-30-01",
  "description": "...",
  "categories": ["cat-1", "cat-2"],
  "dimensionA": ["a-1","a-2"],
  "dimensionB": ["b-1","b-2"],
  "dimensionC": ["c-1"],
  "consumerDemographics": [{
      "dimensionA": "value-a-1",
      "dimensionB": "value-b-1",
      "dimensionC": "value-c-1",
      "percentage": 0.25
    },
    {
      "dimensionA": "value-a-2",
      "dimensionB": "value-b-1",
      "dimensionC": "value-c-1",
      "percentage": 0.5
    },
    {
      "dimensionA": "value-a-2",
      "dimensionB": "value-b-2",
      "dimensionC": "value-c-1",
      "percentage": 0.25
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment