Zach Kauffman zak10

The Problem

The update process in hive that joins the product catalog data to f_product_performance has been causing out of memory failures among many other different types of errors that are nearly impossible to diagnose.

Proposal 1

The first proposal involves sending aggregated data to a key/value store for hydra to consume during the selection process.

Create aggregation script at the end of stats pipeline in pyspark for last 60 days of product performance
Send aggregated data to (in memory) key/value store for O(1) lookup in hydra's selection process
Update hydra's filter to retrieve performance data from key/value store

Pros: speed up update process and reliability in hive, stateless, keeps hydra's speed.

	package blockchain

	import (
	cr "crypto/sha256"
	"encoding/hex"
	"encoding/json"
	"time"
	"fmt"
	)

	package main

	import (
	"encoding/json"
	"fmt"
	"golang.org/x/net/html"
	"net/http"
	"bufio"
	"os"
	"log"

	// services.yml
	catalog.kernel.request_event_listener:
	class: App\CoreBundle\EventListener\KernelBootDatabaseSwitchListener
	arguments: [@request, @doctrine.dbal.default_connection, @logger]
	scope: request
	tags:
	- { name: kernel.event_listener, event: kernel.request, method: onKernelRequest }

	// KernelContextListener
	/**