Skip to content

Instantly share code, notes, and snippets.

@warborn
Last active August 15, 2018 18:53
Show Gist options
  • Save warborn/5adc7c78cadc16b3b483d77deb89e3e8 to your computer and use it in GitHub Desktop.
Save warborn/5adc7c78cadc16b3b483d77deb89e3e8 to your computer and use it in GitHub Desktop.

Insights Builder

Insights Builder is a module that allows to query different source of data in order to extract information and transform it in a format which can be used to generate dynamic charts.

Summary

In order to extract the information, the following steps are followed by the module:

  1. Set configuration
  2. Build query
  3. Get data (execute query)
  4. Transform data

Setting the configuration

The following are the available configurations that can be set:

  • Source
  • X Axis
  • Inverval
  • X Segment
  • Y Axis
  • Y Operation
  • Filters
Configuration Name Required
Source true
X Axis true
X Segment false
Y Axis true
Y Operation true
Filters false

Building the query

The query is built depending on the configurations set. The following are the different ways in which a query is built:

X Axis + Y Axis + Y Operation
{
    aggs: {
        by_main_grouping: {
            terms: {
                field: value
            },
            aggs: {
                by_operation {
                    operation: {
                        field: value
                    }
                }
            }
        } 
    }
}
X Axis + X Segment + Y Axis + Y Operation + Filters
{
    query: {
        bool: {
            must: [
                {terms: {field: value}}
            ]
        }
    },
    aggs: {
        by_main_grouping: {
            terms: {
                field: value
            },
            aggs: {
                by_segment: {
                    terms: {
                        field: value
                    },
                    aggs: {
                        by_operation: {
                            operation: {
                                field: value
                            }
                        }
                    }
                }
            }
        } 
    }
}

Executing the query

A query built with the following configurations results in the following responses:

X Axis + Y Axis + Y Operation

Results in:

{
    by_main_grouping: [
        {
            by_operation: {
                value: value
            }
        }
    ]
}
X Axis + X Segment + Y Axis + Y Operation + Filters

Results in:

{
    by_main_grouping: [
        {
            by_segment: [
                {
                    by_operation: {
                        value: value
                    }
                }
            ]
        }
    ]
}

Transforming the data

After getting a result, a series of transformations can be executed, the following are the supported transformations:

  • Resolve location names (X Axis)
  • Convert result values to the total percentage (Y Axis)
  • Convert result values to percentage change (Y Axis)
  • Convert result values to the corresponding value by 100k inhabitants (Y Axis)

Rules & Restrictions

The module has a set of rules and restrictions that applied to each configuration depending on the value which are listed below:

Source

There are only two supported sources which are: internal and secretariat

X Axis

The X Axis can be any of the following depending on the source selected:

  • internal
    • crime_category: String
    • loss_amount: Number
    • location (municipality, state): String
    • source_name: String
    • reporter_name: String
    • incident_date: Date
  • secretariat
    • crime_category: String
    • crime_type: String
    • modality: String
    • location (municipality, state, country): String
    • date: Date
    • value: Number
    • value_100k: Number

When a field of type date is selected, an interval configuration can be set in order to group the resulting data by the interval specified, the following values are supported:

interval String
month
trimester
year

X Segment

The X Segment can be any of the following depending on the X Axis selected (with the restriction that the selected X Axis cannot be set as X Segment):

  • internal
    • crime_category: String
    • location (municipality, state): String
    • loss_amount: String
    • source_name: String
    • reporter_name: String
  • secretariat
    • crime_category: String
    • crime_type: String
    • modality: String
    • location (municipality, state, country): String

Y Axis

The Y Axis can be any of the following depending on the following:

  • internal
    • crime_category: String
    • loss_amount: Number
    • source_name?: String
    • reporter_name?: String
  • secretariat
    • crime_category: String
    • crime_type: String
    • modality: String
    • value: Number
    • value_100k: Number

Y Operation

The Y Operation can be any of the following depending on the selected source and data type of the Y Axis:

  • internal
    • string
      • count
    • number
      • sum
      • average
  • secretariat
    • string
      • count (resolves to the sum of the value field)
      • by 100k inhabitants (resolves to the sum of the value_100k field)
      • total percentage (resolves to the sum of the value field)
      • percentage change (resolves to the sum of the value field)

Filters

The filters are a way to get only the data you want, these filters depends on the data type of the selected field used to filter, the following are the available fields:

  • internal
    • crime_category: String
    • location (municipality, state): String
    • incident_date: Date
  • secretariat
    • crime_category: String
    • crime_type: String
    • modality: String
    • location (municipality, state, country): String
    • date: Date

For string fields a terms query is used in order to apply the filter

{ 
    terms: {
        field: [values]
    }
}

For date fields a range query is used in order to apply the filter

{
    range: {
        field: {
            gte: start_date,
            lte: end_date
        }
    } 
}

Discrete Locations

In order to be able to generate charts that contain results with more than one of the different types of locations (i.e. municipalities, states, countries) is necessary to specify this information.

Discrete locations are a way to get the data of different types of locations in the same chart and also filter the data by the locations selected.

For example, you want to generate a chart that contains two municipalities from Aguascalientes, the state of Coahuila along with the National rate of homicides. In order to get this information a list of location IDs must be sent.

locs: ['ags_municipality1_id', 'ags_municipality2_id', 'coahuila_state_id', 'mexico_country_id']

Behind the scene the module will convert the list of locations into a dictionary where the keys are the different types of locations and the values are the list of IDs of those types, like the following:

{
    'country_id': ['mexico_country_id'],
    'state_id': ['coahuila_state_id'],
    'municipality_id': ['ags_municipality1_id', 'ags_municipality2_id']
}

Discrete Crimes

In order to be able to generate charts that contain results with more than one of the different types of crime levels (i.e. crime_category, crime_type, modality) is necessary to specify this information.

Discrete crimes are a way to get the data of different levels of crimes in the same chart and also filter the data by the crimes selected.

For example, you want to generate a chart that contains the categories Homicidio y Robo, the types Homicidio Doloso along with the modality Con arma de fuego. In order to get this information a list of crimes must be sent.

For each crime level a query will be built, executed and the final results will be merged resulting in the desired dataset.

DSL

Complex Cases

# Compare the anual rate of "robo de coche de 4 ruedas sin violencia"
# of 5 municipalities from Aguascalientes with the rate of the 
# Aguascalientes state and the National one.

gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
gb.set_x_axis('location')
gb.set_y_axis('modality', 'count')
gb.set_date_filter('date', '2017-01:01 00:00:00', '2017-12-31 00:00:00')
gb.set_filter('modality', ['robo de coche de 4 ruedas sin violencia'])
gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
data = gb.execute()
# Compare the percentage change of the first trimester in 2018
# with the same trimester of the former year of 
# "robo comun con violencia a transeuntes"
# in five municipalities of Chihuahua with the state of Chihuahua,
# three municipalities of Nuevo León and the neighbor state of Nuevo León

gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
gb.set_x_axis_date('date', 'trimester')
gb.set_x_segment('location')
gb.set_y_axis('modality', 'percentual change')
gb.set_date_filter('date', '2017', '2018', 'year')
gb.set_filter('modality', ['robo comun con violencia a transeuntes'])
gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
data = gb.execute()
# Compare the 100k inhabitants rate from January of 2017 to February of 2017
# of "lesiones dolosas con arma de fuego" in one municipality of Aguascalientes,
# two of Zacatecas, one of Guanajuato, one of SLP and one of Tlaxcala

gb = GraphicBuilderService(owner, GraphicBuilderService.SECRETARIAT)
gb.set_x_axis_date('date', 'month')
gb.set_x_segment('location')
gb.set_y_axis('modality', '100k')
gb.set_date_filter('date', '2016-01:01 00:00:00', '2017-01-28 00:00:00')
gb.set_filter('modality', ['lesiones dolosas con arma de fuego'])
gb.set_discrete_field('location', ['0924850f-abe4-407c-80d8-175416b169bb', 'c9b46887-24f2-414a-8f39-07f273a1de2a', '4aa2137a-c19a-4db8-8f4d-1409c0625b70', 'c68c4d2a-b34e-4a22-a2da-d71bceaa1944'])
data = gb.execute()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment