The goal here was to solve our issues with Librato and Node.js only on a distributed system on Heroku. Here's the list of our issues / questions :
- we loved how easy the log drain was to use
- but we sent too much data to use the log drain w/ Heroku so some of it was skipped and the data was inconsistent
- we had multiple machines updating the same counters (e.g. # of concurrent jobs) and we wanted a rate (concurrent jobs / s accross all machines) : you can't do using
counters
librato-node
uses counters internally when you callincrement
- using
librato-node
indev
environment requires you to wrap it - making a request to the API every 15s would induce performance issues
- [Bonus] Configuring in the interface every single metric was painful
I've spent a good deal of time on live-chat with Greg of Librato. He's been incredibly helpful wrapping my head around how SSA worker and how to make our use-case work. :)
So I ended up with this small piece of code to try to fix all of those. It's obviously opinionated and shouldn't be seen as a lib, more as a reusable block that you should modify depending on your needs.
Every single gauge will have SSA (Service-Side Aggregation) enabled, allowing you to have multiple machines updating the same metric.
- When you want an average (timing, load of machines, ...), add the metric to your graph and select
average of averages
- When you want a rate (counter / s), add the metric to your graph, select
average of sums
and addx / p
as a formula to get the rate per second
-
.count(name, value[, unit])
which is a counter -
.measure(name, value[, unit])
which is a normal gauge -
period
can be different smaller than the uploading period (by a ratio). For instance, we're using a period of 15s and send the data every 2 minutes. -
dev friendly: will simply log if the ENV variables aren't set
-
Extra points: this format is also compatible with the Heroku log drain - if at any point you have an issue, just unset LIBRATO_EMAIL from your env and the logs should take over
-
automatic configuration of your metrics every time the system is about to push a never seen one (since the process start):
- it will compute a pretty name from your metric name
- will set units if you've used them on your calls to
.count
or.measure
- will set the period of the metric to the period in the file : change it once, let it propagate everywhere
- always set
display_min
to 0 since it's rarely relevant to see only fluctuations
This works with the sources
version of Librato, and is untested with the tags
one.
When on Heroku, you are necessarily using the source
version.
If you're unsure, see here: https://www.librato.com/docs/kb/faq/account_questions/tags_or_sources/ .
The automatic setup of the metric will be launched by each process that pushes this metric for the first time. The convenience was too good for us to avoid running N API calls once every time the processes are restarted, but that's something you should be aware of.
We're not handling min
/ max
.
This can easily be added.
However you shouldn't add sum
and count
for counters, or you will break the counting accross all machines functionality.
Indeed, when you select average of sums
, Librato's SSA will try to take the sum (across all the machines) of the sums (of each machine).
This is why this works: each machine sends sum
== average
, ending up creating a average of sum of averages
, or more simply a sum (across machines) of averages == counts (on each machine).
If your library is better than the current one, why not just open-source it with real tests? I can help you with that ( ~1day). That would benefit a lot the community and librato users, along with strengthening the module.
Not high priority but still, interesting