NATS is an excellent, clustered, full-mesh PubSub messaging system, highly performant and a cakewalk to setup. Full mesh means every node (servers and clients) knows about every other node, which is great, but makes it tricky to have multiple publishers on hot standby, for high availability of publishers (not the NATS network), while avoiding duplicate pubs.
Here --no-advertise
comes in handy if we're willing to sacrifice the automatic meshing and discovery mechanism. This may be acceptable in setups where only a fixed set of NATS servers run in a cluster and whose addresses (either IPs or hostnames) are known.
The gnatsd --no-advertise
flag makes a NATS server not advertise itself automatically to the mesh. For other nodes to discover --no-advertise
nodes, the --routes
have to be explicitly specified. If there are N
servers, there should be N
routes.
gnatds -sl reload=pid
makes a running NATS server reload configuration from its config file (-c
) without downtime. This can be used to take out
- A cluster of two NATS servers
server0
andserver1
that haveN
subscribers listening to the subjecttest
. - A live publisher,
publisher0
that is publishing on the subjecttest
. - A hot standby publisher
publisher1
, who is also publisheing on the subjecttest
, but whose messages should only take effect in the cluster ifpublisher0
goes down.
- Each publisher gets its own local NATS server (here,
dummy-nats0
anddummy-nats1
respectively for publisherspublisher0
andpublisher
). - The publishers do not publish directly to the upstream cluster, but to their local NATS servers.
- The primary publisher
publisher0
's dummy NATS serverdummy-nats0
is clustered to the upstream NATS servers (viaroutes
). - The backup publiser
publisher1
's dummy NATS serverdummy-nats1
is not clustered to the upstream NATS servers (emptyroutes
). - These configurations are specified in local configuration files.
- When publisher0 goes down or there is a fault (assuming there's a healthcheck mechanism)
- Remove the upstream's NATS
routes
fromnats-dummy0
's configuration and issue agnatsd -sl reload
. - Add the upstream's NATS
routes
tonats-dummy1
and do agnatsd -sl reload
.
- Remove the upstream's NATS
The messages publisher0
had been publishing will immediately cease and make way for publisher1
. Even if publisher0
or dummy-nats0
come back up, the messages will be self contained and not pushed to the cluster as the --no_advertise
prevents automatic discovery and cluster formation, avoiding duplicate messages.
+-----------------------------------------------------------------------------------------------------+
| |
| N ... subscribers |
| |
+-----------------------------------------------------------------------------------------------------+
-/ -\
-/ -\
-/ -\
-/ -\
-/ -\
+----------------------------+ +---------------------------------+
| | | |
| NATS server0 | | NATS server1 |
| | | |
| listen :4222 | | listen :4222 |
| cluster-listen :4248 |---------| cluster-listen :4248 |
| no-advertise | | no-advertise |
| | | |
| | | nats-routes server0:4248 |
+-------------|--------------+ +---------------------------------+
| -----/
| -----/
| -----/
| -----/
| -----/
| -----/
|--/
+-----------------|--------------+ +--------------------------------+
| | | |
| NATS dummy-nats0 | | NATS dummy-nats1 |
| | | |
| listen :4222 | | listen :4222 |
| cluster-listen :4248 | | cluster-listen :4248 |
| no-advertise | | no-advertise |
| | | |
| routes server0:4248 | | nats [] |
| server1:4248 | | |
+----------------|---------------+ +----------------|---------------+
| |
| |
| |
+----------|----------+ +----------|----------+
| | | |
| publisher0 | | publisher1 |
| subject test | | subject test |
| | | |
| | | |
+---------------------+ +---------------------+