Skip to content

Instantly share code, notes, and snippets.

@dkubb
Created October 24, 2011 02:32
Show Gist options
  • Save dkubb/1308248 to your computer and use it in GitHub Desktop.
Save dkubb/1308248 to your computer and use it in GitHub Desktop.
Veritas Explanation

https://github.com/dkubb/veritas

High level design

  • In Veritas a relation is an Enumerable object that will yield a set of tuples.
  • Internally a relation is represented as a tree.
  • The leaf nodes are base relations, which are the data sources.
  • There are 3 types of inner nodes:
    1. A node may be a relational algebra operation like join, rename, project or others. When iterated it will evaluate it's children, then perform the operation in-memory and yield each tuple from the result.
    2. A node may be materialized, which means it has been loaded into memory. When iterated it will yield each tuple immediately.
    3. A node may a gateway, which is a decorator that wraps another node. When iterated it passes the child node to an adapter which performs an equivalent query, and yields each tuple.
  • An adapter may do whatever it likes as long as it returns results equivalent to the operations the node represents. In the case of an RDBMS adapter; it will generate SQL from the node, evaluate it, and then yield each tuple to the gateway.
  • Data flows through a pipeline from the leaf nodes to the root where each node retrieves, transforms or filters the data.
  • The interface between nodes is uniform, and gateways may use different adapters. This allows data to be combined seamlessly between multiple shards, databases or datastores.
  • In the future it should be possible to evaluate the children of binary operations in parallel.
@dkubb
Copy link
Author

dkubb commented Oct 24, 2011

I'm working through making this more clear so I can include it in the Veritas README. Questions and comments are very much welcomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment