https://github.com/dkubb/veritas
- In Veritas a relation is an Enumerable object that will yield a set of tuples.
- Internally a relation is represented as a tree.
- The leaf nodes are base relations, which are the data sources.
- There are 3 types of inner nodes:
- A node may be a relational algebra operation like join, rename, project or others. When iterated it will evaluate it's children, then perform the operation in-memory and yield each tuple from the result.
- A node may be materialized, which means it has been loaded into memory. When iterated it will yield each tuple immediately.
- A node may a gateway, which is a decorator that wraps another node. When iterated it passes the child node to an adapter which performs an equivalent query, and yields each tuple.
- An adapter may do whatever it likes as long as it returns results equivalent to the operations the node represents. In the case of an RDBMS adapter; it will generate SQL from the node, evaluate it, and then yield each tuple to the gateway.
- Data flows through a pipeline from the leaf nodes to the root where each node retrieves, transforms or filters the data.
- The interface between nodes is uniform, and gateways may use different adapters. This allows data to be combined seamlessly between multiple shards, databases or datastores.
- In the future it should be possible to evaluate the children of binary operations in parallel.
I'm working through making this more clear so I can include it in the Veritas README. Questions and comments are very much welcomed.