Node Philly 2013

Leadonomics on Monitoring

Presenter: Tom Shawver, @tomfrost
Company: Leadnomics

They are an ad server platform that was built on php/mysql. did not scale.

Greatist Hits (aka great fail)

Node is single thread, this is false, shows a libuv code sample

static uv_thread_t default_threads[4];

There are actually 4 threads in the pool. But networking calls are async and use a system-stack kernel and don't consume threads. File system calls will consume a thread. So some things are still thread blocking and some things are not.

With node 0.10.x garbage collection improvement

setImmediate() vs processNextTick()

before 0.10.0 the event loop worked

empty nextTick queue -> callbacks for queued tasks empty nextTick queue… you wouldn't know if the code would execute now or next tick.

need to schedule execution immediately within the current tick processNextTick()

so if you needed to schedule execution for the next tick setImmediate()

WTF

Harsh reality things will break.

So need to monitor it, use splunk new relic.

But you need super smarter monitor.

Use metrics more suited to Node.js
Monitor application specific values alongside system values, aka 'free-memory' is probably useless and other values.
Know your danger zones
Auto-repair unforeseen issues when possible
Log issues at non-alert level if the monitor can auto-reapir

Tick Monitoring, we measure how long it takes to do a group of 1000 ticks

{
    tick:7.23
    avgMs:5.87
    maxMS:9.00
    perSec:1234
}

code demo of express with [github.com/Leadnomics/node-vitalsigns](vital signs)

attaches vital signs to the express and then tells it to record various metrics [cpu, men, tick, uptime]

Show that we can configure vitalSigns to monitor for unhealthy behavior which will then have express server out a 503 (service unavailable) whereby most load balancers can remove the server from it's round-robin-system and retarget at a later time to see if the server is returning 200 again and if so add it back to the round-robin.

Controlling and Load Balancing Complex Applications with Nodejs

Presenter: Bryan Paluch, @bryanpaluch
Company: Comcast

Does Telephony / IP Communications, tech lead on the WebRTC initiative

Shows a crazy diagram of a complex system, legacy telephony that are stateful and hard to load balance. Most of the system require system level programming and not web development. Node is great cross platform and without the bulk of the jvm. Node provides incredible real time system communication.

So at comcast they were outsourcing all their conference call communication systems. Cost them several million dollars a year. Decision to bring conference calling in house, they are a telephony company after all.

Load Balancing Conference Servers

Calls come in, and get routed to a conferencing server (these servers have to be sticky)
Every user in the conference must be routed to the same server
multiple servers are needed to support service
load needs to be equally spread among the cluster

Problem 1

Hot Spotting, occurs when a buildup can occur, example of 100 people on a conference call talking for 2 hrs vs 2 people for 2 mins. Can't move them real time. Round robin is incapable of solving this problem.

Solution 1

Comcast uses FreeSwitch an open source conference server. Freeswitch exposes an interface that node-esl can communicate with.

Node as a central brain to report load and server statistics (analyze Freeswitch server)

Create a lock on conference location in central brain make sure the central brain does not go down

So how do we do this? Redis acts a state storage/central brain

Problem 2

Load Balancing Transcoders

Differences in transcoding from conferencing.

state doesn't need to be distributed to all users
only the service needs to know the stickiness (which transcoding gateway)

Solution 2

Use Peer 2 Peer Service Architecture, diagram showing how the various services message one-another their state information instead of using a central db-server.

The gossip is the term for how they can integrate, so any given server should know which server is least loaded and they can contact the load balancer indicating they should receive jobs/processes.

Node libraries GossipMonger by Tristan Slominski Gossiper Vines by Paolo Fragomeni (nodejitsu)

DEMO using his p2pdemo-arch using GossipMonger Shows his github demo where has this p2p messaging system. Talks about the cons to this approach being scale.

StrongLoop - NodeFly/Ops

Presenter: Glen Lougheed, @glougheed
Company: StrongLoop

demo of the StrongLoop dashboard for node's server performance.

Basic Architecture, initially to have api,web,db machines all communicate to a collector service using web sockets.

They tried using express and socket.io to communicate with their collector. Found Express to be awesome. socket.io sucks. Socket.io is terrible for machine-machine communication specifically around reconnection. Socket.io requires a dual-handshake which a load-balancer can mis-match the handshake which invalidates the authentication. Tried socksjs and axon. both fail. They built Uhura, a small event emitter that routes communications. Goal being to keep it small so that people can swap out their own networking pieces, tcp or udp.

Journey Through LevelDB

Presenter: Jarrett Cruger, @jcrugzz
Company: Nodejitsu

LevelDB is a basic key-value pair storage system Uses LSM (Log Structured Merge Tree)

demo of a twitter-stream system that pipes to levelDB and then pipes a readable stream from levelDB to websocket/primus

Websocket system called Primus

mgan59/node_philly_2013_notes.md