Here's a basic watch script that will look through the output for serfHealth
checks in
the critical
state. The serfHealth
check is a built-in check added by Consul that keeps
track of the health of a node. When this watch handler fires, it will get the JSON body of
the health endpoint passed
to it over stdin
.
#!/bin/sh
for node in $(jq '.[] | select(.CheckID=="serfHealth" and .Status=="critical") | .Node' -); do
echo "$node is dead"
done
You can then run this on the command line via consul watch
. We add filtering so this only gets
called with there are checks in the critical state, but the script can handle not doing that:
consul watch -type=checks -state=critical ./dead_node.sh
This could also get registered with an agent if you don't want to run a command line thing on the side.
The only limit to this script is that it will see all failed nodes every time any node fails. If you are sending a summary of failed nodes as a notification then this will work fine, otherwise you'll need to keep a little state some place to not re-trigger notifications (maybe some rate limit).
Here's a sample run:
workpad:consul-demo-tf james$ consul watch -type=checks -state=critical ./dead_node.sh
"consul-client-nyc3-2" is dead
# watch blocks waiting for more updates...