In general, this pipeline is a set of immutable append-only logs, processors, and stores. As any level 1 system context diagram ought to be, it is technology agnostic and could be implemented with a number of technologies, but I admit to having Apache Kafka and Apache Spark Streaming in mind as I designed this. In order to understand the value of this approach, I recommend reading Martin Kleppmann's Turning the database inside-out. I urge you strongly to read that article before continuing to evaluate the diagram or continuing with these comments.
The purpose of the pipeline is to end up with always-up-to-date stores of data that can be performantly queried at scale. Source files stream through the pipeline and cause streaming updates to what can be thought of as "materialized views" whose implementation and technology can be chosen based on the query characteristics. For example, an ela