- It's frequently useful to graph a metric over a cluster of hosts, e.g. "show me the number of requests/s being handled by all of my load balancers".
- Doing this in vanilla Graphite is easy - it honors both glob expressions (
lb*
) and brace expressions ({a,b,c,d}
). - But how do we generate these for clusters whose hostnames don't glob well, and/or whose members change over time?
- Generally flippin' awesome.
- Has a database of Metrics, Graphs composing 1+ Metrics, and Dashboards composing 1+ Graphs.
- Metrics are statically defined Graphite target strings plus other metadata.
- For example,
servers.s0200.jvm.memory_used
,servers.{lb1,lb2,lb3}.varnish.requests
orservers.lb*.varnish.requests
.
- For example,
- Metric creation is straightforward: via Web or API, you create new Metric objects with new (static) target string values.
- Think Chef Server, Heira, Clusto, Zookeeper or any other dynamic database concerned with collections of servers.
- Responsible for organizing servers into logical groups like "the production load balancers" or "Hbase cluster #5".
- Frequently used to drive configuration management (e.g. generation of monitoring config files) and orchestration tools (e.g. "go Chef the Web app worker cluster").
- Tools like Descartes work best with "vanilla" Graphite targets as mentioned above - leveraging globbing or brace expressions to arrive at a static target string.
- If your servers' hostnames line up this way, you're good - think of a cluster-oriented metric, jot down the metric + glob-expr you need, and you're done.
- When one's environment requires the use of truth databases - e.g. because you have lots of multi-tenant servers whose (captured in metric paths) hostname can't reflect everything on them; or you have legacy concerns; or etc - hand-entered metric paths no longer work.
- E.g. if your Varnish pool is no longer expressable as
lb*
but needs to be{lb1,lb2,app04,app15,s0233}
, and then multiply that by all your clusters, that's a huge amount of extra work when manually generating Graphite targets. - Cluster memberships may also change frequently - having to redo all graphs related to your LB cluster when you drop or add a node is simply untenable.
- E.g. if your Varnish pool is no longer expressable as
- Use a statically configured tool (e.g. GDash), leveraging your truth DB + config management to construct the right target paths.
- E.g. ERb templates that look like
servers.{<%= lbs.join(',') %>}.varnish.requests
, wherelbs
comes from a truth DB query. - This is sub-optimal: you're not using Descartes' existing/planned features :(
- Engineers must roundtrip all dashboarding through config management, etc.
- E.g. ERb templates that look like
- Run a sync script periodically which polls your truth DB, then creates or edits Descartes Metric, Graph and Dash objects via Descartes' API.
- Within the realm of possibility, but sync is generally difficult, messy and error-prone.
- Users may try editing objects in Descartes between syncs, only to see their changes overwritten at sync time.
- Background sync jobs add operational complexity, can fail without causing obvious errors (silent failure - bad!) etc.
- Extend Descartes so it understands the idea of splicing external query results into target paths at runtime.
- E.g. configure a truth DB query endpoint, parameterizable by cluster name or other search token.
- Allow interpolation syntax within Metric target fields, e.g.
servers.<clustername>.varnish.requests
.- Or
servers.*.varnish.requests
with an additional per-Metric config option saying to replace the Nth asterisk with a query forclustername
. (The other method feels cleaner/more usable, but this is similar to Graphite's ownaliasByMetric()
.)
- Or
- Result is that, on request, display of that Metric hits up the truth DB and results in a final Graphite target of
servers.{a,b,c,d}.varnish.requests
.- Could be cached, within reason.
- Could maybe even extend further to allow dynamically generated parts of the URI tree?
FWIW Descartes handles "hand-entered metric paths" just fine as long as they're imported via Graphite URL. It accepts any valid Graphite URL.