To write a zookeeper client, it is neccessary to understand
the protocol used to communicate with it. Unfortunately, this
protocol is not documented. In this example, we use tcpflow
to dump both the read and write channel of the TCP connection
that zkCli.sh
uses to connect with the zookeeper server.
This is the result of connecting to a zookeeper server with zkCli.sh
and, running get /foo
(this znode does actually exist), and then running
close
. The client sends this:
00000000: 0000 002d 0000 0000 0000 0000 0000 0000 ...-............
00000010: 0000 7530 0000 0000 0000 0000 0000 0010 ..u0............
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 1100 0000 0100 0000 0400 0000 ................
00000040: 042f 666f 6f00 0000 0008 0000 0002 ffff ./foo...........
00000050: fff5 ..
Chris Nauroth's StackOverflow answer provides a good starting point for discerning the meaning. Each request is prefaced by its length encoded as a big-endian 32-bit word. Here are some relevant bits of the jute file describing the models:
class RequestHeader {
int xid;
int type;
}
class ConnectRequest {
int protocolVersion;
long lastZxidSeen;
int timeOut;
long sessionId;
buffer passwd;
}
class GetDataRequest {
ustring path;
boolean watch;
}
There is a ConnectRequest
followed by two standard requests in
this dump. I suspect that before a zookeeper session has been
established, any sequence of bytes is parsed as ConnectRequest
.
Also, there's some nonsense going on in ClientCnxn.java
in the
implementation of createBB
for Packet
. There's an extra
boolean field named readOnly
that doesn't show up in the jute
file, but it gets tacked on to the end of the request.
0000 002d
(request comprised of next 45 bytes)0000 0000
(protocol version 0, zookeeper never bumps this)0000 0000 0000 0000
(lastZxidSeen
is 0)0000 7530
(timeout is 30000 milliseconds)0000 0000 0000 0000
(session id is 0)0000 0010
(length ofbuffer passwd
is 16)0000 0000 0000 0000 0000 0000 0000 0000
(password is 16 null bytes)00
(read-only is false, i.e. this connection can issue writes)
0000 0011
(request comprised of next 17 bytes)0000 0001
(connection id 1)0000 0004
(op code 4:getData
)0000 0004
(length ofustring path
is 4)2f66 6f6f
(the ASCII-encoded characters/foo
)00
(the booleanwatch
, probably set tofalse
, what does this do?)
0000 0008
(request comprised of next 8 bytes)0000 0002
(connection id 2)ffff fff5
(op code -11:closeSession
)
Alright, let's take a look at the responses:
00000000: 0000 0025 0000 0000 0000 7530 016b c918 ...%......u0.k..
00000010: a8ee 0002 0000 0010 aaf7 2e9b dd17 87a2 ................
00000020: 44e3 ed5a 8753 99c9 0000 0000 5f00 0000 D..Z.S......_...
00000030: 0100 0000 0000 0000 0800 0000 0000 0000 ................
00000040: 0765 7861 6d70 6c65 0000 0000 0000 0004 .example........
00000050: 0000 0000 0000 0004 0000 016b c926 0af4 ...........k.&..
00000060: 0000 016b c926 0af4 0000 0000 0000 0000 ...k.&..........
00000070: 0000 0000 0000 0000 0000 0000 0000 0007 ................
00000080: 0000 0000 0000 0000 0000 0004 0000 0010 ................
00000090: 0000 0002 0000 0000 0000 0009 0000 0000 ................
In the zookeeper CLI, what we see is:
[zk: localhost:2181(CONNECTED) 0] get /foo
example
cZxid = 0x4
ctime = Sat Jul 06 21:17:22 UTC 2019
mZxid = 0x4
mtime = Sat Jul 06 21:17:22 UTC 2019
pZxid = 0x4
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
The relevant models are:
class ConnectResponse {
int protocolVersion;
int timeOut;
long sessionId;
buffer passwd;
}
class ReplyHeader {
int xid;
long zxid;
int err;
}
class Stat {
long czxid; // created zxid
long mzxid; // last modified zxid
long ctime; // created
long mtime; // last modified
int version; // version
int cversion; // child version
int aversion; // acl version
long ephemeralOwner; // owner id if ephemeral, 0 otw
int dataLength; //length of the data in the node
int numChildren; //number of children of this node
long pzxid; // last modified children
}
class GetDataResponse {
buffer data;
org.apache.zookeeper.data.Stat stat;
}
In the zookeeper source code, there is a hack for
tacking readOnly
on to the end of ConnectResponse
. This can be
found by searching for readOnly
in ZooKeeperServer.java
.
0000 0025
(response is comprised of next 37 bytes)0000 0000
(connection id 0)0000 7530
(timeout is 30000 milliseconds)016b c18 a8ee 0002
(session id)0000 0010
(length ofbuffer passwd
is 16)aaf7 2e9b dd17 87a2 44e3 ed5a 8753 99c9
(server generated password)00
(server is not in read-only mode)
0000 005f
(response is comprised of next 95 bytes)0000 0001
(connection id 1)0000 0000 0000 0008
(transaction id 8)0000 0000
(error code 0, presumably this means no error)65 7861 6d70 6c65
(data
buffer contents, ASCII encoding of "example")0000 0000 0000 0004
(createdzxid
is 4)0000 0000 0000 0004
(last modifiedzxid
is 4)0000 016b c926 0af4
(created on Sat Jul 06 21:17:22 UTC 2019)0000 016b c926 0af4
(last modified on Sat Jul 06 21:17:22 UTC 2019)0000 0000
(version is 0)0000 0000
(c version is 0, what is this?)0000 0000
(acl version is 0)0000 0000 0000 0000
(ephemeral owner is 0x0, what is this?)0000 0007
(data length is 7, this seems redundant)0000 0000
(number of children is 0)0000 0000 0000 0004
(last child modified by zxid 4)
0000 0010
(request comprised of next 16 bytes)0000 0002
(connection id 2)0000 0000 0000 0009
(transaction id 9)0000 0000
(error code 0, presumably this means no error)
That's all for now. Hopefully this is instructive for others.