- Schemaless: The format does not have a knowledge about the underlying data at all.
- Byte-oriented: The format is built upon a byte stream, probably for the ease of implementation and performance.
- Binary: The format is not targeted for human consumption and specified in terms of "bytes" (which is 8 bits long for our purpose).
- Serialization: The format is primarily to be used for storage and transmission, not for the in-memory representation.
- S-Expressions (1997), a "canonical" encoding (which is also used for transport)
- Bencode (2001)
- MsgPack (2008)
- BSON (2009),
element
non-terminal (which is easier to directly compare than the top-leveldocument
) - Smile (2010)
- UBJSON (2012)
- CBOR (2013)
Non-contenders that are nevertheless worth mentioning:
- Schema-dependent
- ASN.1 BER/CER/DER (1984; partial)
- NetStrings (1997)
- EBML (2004)
- Bit-oriented
- XSI (2011)
- Textual
- JSON (formally specified in 2002)
Binary notations:
00
throughff
for literal bytes0?
,?0
or??
etc. for varying or unspecified nibbles (notes follow)"foo"
or'foo'
for ASCII representation of literal bytes (no escape sequence).X
for single literal ASCII byte(...)
for omitted bytes; in particular,(NNN)
with all digits NNN means NNN omitted bytes{...}*NNN
for NNN copies of...
Data | SExp. | Bencode | MsgPack | BSON | Smile | UBJSON | CBOR |
---|---|---|---|---|---|---|---|
Self Identification | N/A | N/A | N/A | N/A | .: .) 0a 0? [1] |
N/A | d9 d9 f7 |
Null/nil | N/A | N/A | c0 |
0a (name) |
21 |
.Z |
f6 |
Undefined/undef | N/A | N/A | N/A | 06 (name) [2] |
N/A | N/A | f7 |
False | N/A | N/A | c2 |
08 (name) 00 |
22 |
.F |
f4 |
True | N/A | N/A | c3 |
08 (name) 01 |
23 |
.T |
f5 |
0 | N/A | "i0e" |
00 † |
10 (name) 00 00 00 00 † |
c0 † [3] |
.i 00 † |
00 † |
1 | N/A | "i1e" |
01 † |
10 (name) 01 00 00 00 † |
c2 † |
.i 01 † |
01 † |
2 | N/A | "i2e" |
02 † |
10 (name) 02 00 00 00 † |
c4 † |
.i 02 † |
02 † |
3 | N/A | "i3e" |
03 † |
10 (name) 03 00 00 00 † |
c6 † |
.i 03 † |
03 † |
10 | N/A | "i10e" |
0a † |
10 (name) 0a 00 00 00 † |
d4 † |
.i 0a † |
0a † |
15 | N/A | "i15e" |
0f † |
10 (name) 0f 00 00 00 † |
de † |
.i 0f † |
0f † |
16 | N/A | "i16e" |
10 † |
10 (name) 10 00 00 00 † |
20 a0 † |
.i 10 † |
10 † |
23 | N/A | "i23e" |
17 † |
10 (name) 17 00 00 00 † |
20 ae † |
.i 17 † |
17 † |
24 | N/A | "i24e" |
18 † |
10 (name) 18 00 00 00 † |
20 b0 † |
.i 18 † |
18 18 † |
31 | N/A | "i31e" |
1f † |
10 (name) 1f 00 00 00 † |
20 be † |
.i 1f † |
18 1f † |
32 | N/A | "i32e" |
20 † |
10 (name) 20 00 00 00 † |
20 01 80 † [4] |
.i 20 † |
18 20 † |
99 | N/A | "i99e" |
63 † |
10 (name) 63 00 00 00 † |
20 03 86 † |
.i 63 † |
18 63 † |
100 | N/A | "i100e" |
64 † |
10 (name) 64 00 00 00 † |
20 03 88 † |
.i 64 † |
18 64 † |
127 | N/A | "i127e" |
7f † |
10 (name) 7f 00 00 00 † |
20 03 be † |
.i 7f † |
18 7f † |
128 | N/A | "i128e" |
cc 80 † |
10 (name) 80 00 00 00 † |
20 04 80 † |
.U 80 † |
18 80 † |
255 | N/A | "i255e" |
cc ff † |
10 (name) ff 00 00 00 † |
20 07 be † |
.U ff † |
18 ff † |
256 | N/A | "i256e" |
cd 01 00 † |
10 (name) 00 01 00 00 † |
20 08 80 † |
.I 01 00 † |
19 01 00 † |
999 | N/A | "i999e" |
cd 03 e7 † |
10 (name) e7 03 00 00 † |
20 1f 8e † |
.I 03 e7 † |
19 03 e7 † |
1000 | N/A | "i1000e" |
cd 03 e8 † |
10 (name) e8 03 00 00 † |
20 1f 90 † |
.I 03 e8 † |
19 03 e8 † |
4095 | N/A | "i4095e" |
cd 0f ff † |
10 (name) ff 0f 00 00 † |
20 7f be † |
.I 0f ff † |
19 0f ff † |
4096 | N/A | "i4096e" |
cd 10 00 † |
10 (name) 00 10 00 00 † |
20 01 00 80 † |
.I 10 00 † |
19 10 00 † |
32767 | N/A | "i32767e" |
cd 7f ff † |
10 (name) ff 7f 00 00 † |
20 07 7f be † |
.I 7f ff † |
19 7f ff † |
32768 | N/A | "i32768e" |
cd 80 00 † |
10 (name) 00 80 00 00 † |
20 08 00 80 † |
.l 00 00 80 00 † |
19 80 00 † |
65535 | N/A | "i65535e" |
cd ff ff † |
10 (name) ff ff 00 00 † |
20 0f 7f be † |
.l 00 00 ff ff † |
19 ff ff † |
65536 | N/A | "i65536e" |
ce 00 01 00 00 † |
10 (name) 00 00 01 00 † |
20 10 00 80 † |
.l 00 01 00 00 † |
1a 00 01 00 00 † |
524287 | N/A | "i524287e" |
ce 00 07 ff ff † |
10 (name) ff ff 07 00 † |
20 7f 7f be † |
.l 00 07 ff ff † |
1a 00 07 ff ff † |
524288 | N/A | "i524288e" |
ce 00 08 00 00 † |
10 (name) 00 00 08 00 † |
20 01 00 00 80 † |
.l 00 08 00 00 † |
1a 00 08 00 00 † |
231-1 | N/A | "i2147483647e" |
ce 7f ff ff ff † |
10 (name) ff ff ff 7f † |
20 1f 7f 7f 7f be † |
.l 7f ff ff ff † |
1a 7f ff ff ff † |
231 | N/A | "i2147483648e" |
ce 80 00 00 00 † |
12 (name) 00 00 00 80 00 00 00 00 |
20 20 00 00 00 80 † |
.L 00 00 00 00 80 00 00 00 † |
1a 80 00 00 00 † |
232-1 | N/A | "i4294967295e" |
ce ff ff ff ff † |
12 (name) ff ff ff ff 00 00 00 00 |
20 3f 7f 7f 7f be † |
.L 00 00 00 00 ff ff ff ff † |
1a ff ff ff ff † |
232 | N/A | "i4294967296e" |
cf 00 00 00 01 00 00 00 00 † |
12 (name) 00 00 00 00 01 00 00 00 |
21 40 00 00 00 80 † |
.L 00 00 00 01 00 00 00 00 † |
1b 00 00 00 01 00 00 00 00 |
263-1 | N/A | "i9223372036854775807e" |
cf 7f ff ff ff ff ff ff ff † |
12 (name) ff ff ff ff ff ff ff 7f |
21 03 7f 7f 7f 7f 7f 7f 7f 7f be † |
.L 7f ff ff ff ff ff ff ff † |
1b 7f ff ff ff ff ff ff ff |
263 | N/A | "i9223372036854775808e" |
cf 80 00 00 00 00 00 00 00 |
N/A | 22 89 00 20 00 00 00 00 00 00 00 00 00 [5] |
.H .i 13 "9223372036854775808" |
1b 80 00 00 00 00 00 00 00 |
264-1 | N/A | "i18446744073709551615e" |
cf ff ff ff ff ff ff ff ff |
N/A | 22 89 00 3f 7f 7f 7f 7f 7f 7f 7f 7f 03 |
.H .i 14 "18446744073709551615" |
1b ff ff ff ff ff ff ff ff |
264 | N/A | "i18446744073709551616e" |
N/A | N/A | 22 89 00 40 00 00 00 00 00 00 00 00 00 |
.H .i 14 "18446744073709551616" |
N/A |
350 | N/A | "i717897987691852588770249e" |
N/A | N/A | 22 8b 00 26 00 55 1f 43 36 2f 68 27 3c 3c 09 |
.H .i 18 "717897987691852588770249" |
N/A |
3500 | N/A | "i36360" (229) "10001e" |
N/A | N/A | 22 01 a4 00 59 2b (109) 6e 44 01 |
.H .U ef "36360" (229) "10001" |
N/A |
35000 | N/A | "i40389" (2376) "00001e" |
N/A | N/A | 22 0f 9f 0e 06 34 (1127) 0d 7a 01 |
.H .I 09 52 "40389" (2376) "00001" |
N/A |
-1 | N/A | "i-1e" |
ff † |
10 (name) ff ff ff ff † |
c1 † |
.i ff † |
20 † |
-2 | N/A | "i-2e" |
fe † |
10 (name) fe ff ff ff † |
c3 † |
.i fe † |
21 † |
-3 | N/A | "i-3e" |
fd † |
10 (name) fd ff ff ff † |
c5 † |
.i fd † |
22 † |
-16 | N/A | "i-16e" |
f0 † |
10 (name) f0 ff ff ff † |
df † |
.i f0 † |
2f † |
-17 | N/A | "i-17e" |
ef † |
10 (name) ef ff ff ff † |
20 a1 † |
.i ef † |
30 † |
-24 | N/A | "i-24e" |
e8 † |
10 (name) e8 ff ff ff † |
20 af † |
.i e8 † |
37 † |
-25 | N/A | "i-25e" |
e7 † |
10 (name) e7 ff ff ff † |
20 b1 † |
.i e7 † |
38 18 † |
-32 | N/A | "i-32e" |
e0 † |
10 (name) e0 ff ff ff † |
20 bf † |
.i e0 † |
38 1f † |
-33 | N/A | "i-33e" |
d0 df † |
10 (name) df ff ff ff † |
20 01 81 † |
.i df † |
38 20 † |
-100 | N/A | "i-100e" |
d0 9c † |
10 (name) 9c ff ff ff † |
20 03 87 † |
.i 9c † |
38 63 † |
-128 | N/A | "i-128e" |
d0 80 † |
10 (name) 80 ff ff ff † |
20 03 bf † |
.i 80 † |
38 7f † |
-129 | N/A | "i-129e" |
d1 ff 7f † |
10 (name) 7f ff ff ff † |
20 04 81 † |
.I ff 7f † |
38 80 † |
-256 | N/A | "i-256e" |
d1 ff 00 † |
10 (name) 00 ff ff ff † |
20 07 bf † |
.I ff 00 † |
38 ff † |
-257 | N/A | "i-257e" |
d1 fe ff † |
10 (name) ff fe ff ff † |
20 08 81 † |
.I fe ff † |
39 01 00 † |
-32768 | N/A | "i-32768e" |
d1 80 00 † |
10 (name) 00 80 ff ff † |
20 07 7f bf † |
.I 80 00 † |
39 7f ff † |
-32769 | N/A | "i-32769e" |
d2 ff ff 7f ff † |
10 (name) ff 7f ff ff † |
20 08 00 81 † |
.l ff ff 7f ff † |
39 80 00 † |
-65536 | N/A | "i-65536e" |
d2 ff ff 00 00 † |
10 (name) 00 00 ff ff † |
20 0f 7f bf † |
.l ff ff 00 00 † |
39 ff ff † |
-65537 | N/A | "i-65537e" |
d2 ff fe ff ff † |
10 (name) ff ff fe ff † |
20 10 00 81 † |
.l ff fe ff ff † |
3a 00 01 00 00 † |
-231 | N/A | "i-2147483648e" |
d2 80 00 00 00 † |
10 (name) 00 00 00 80 † |
20 1f 7f 7f 7f bf † |
.l 80 00 00 00 † |
3a 7f ff ff ff † |
-231-1 | N/A | "i-2147483649e" |
d3 ff ff ff ff 7f ff ff ff |
12 (name) ff ff ff 7f ff ff ff ff |
21 20 00 00 00 81 † |
.L ff ff ff ff 7f ff ff ff † |
3a 80 00 00 00 † |
-232 | N/A | "i-4294967296e" |
d3 ff ff ff ff 00 00 00 00 |
12 (name) 00 00 00 00 ff ff ff ff |
21 3f 7f 7f 7f bf † |
.L ff ff ff ff 00 00 00 00 † |
3a ff ff ff ff † |
-232-1 | N/A | "i-4294967297e" |
d3 ff ff ff fe ff ff ff ff |
12 (name) ff ff ff ff fe ff ff ff |
21 40 00 00 00 81 † |
.L ff ff ff fe ff ff ff ff † |
3b 00 00 00 01 00 00 00 00 |
-263 | N/A | "i-9223372036854775808e" |
d3 80 00 00 00 00 00 00 00 |
12 (name) 00 00 00 00 00 00 00 80 |
21 03 7f 7f 7f 7f 7f 7f 7f 7f bf † |
.L 80 00 00 00 00 00 00 00 † |
3b 7f ff ff ff ff ff ff ff |
-263-1 | N/A | "i-9223372036854775809e" |
N/A | N/A | 22 89 7f 5f 7f 7f 7f 7f 7f 7f 7f 03 |
.H .i 14 "-9223372036854775809" † |
3b 80 00 00 00 00 00 00 00 |
-264 | N/A | "i-18446744073709551616e" |
N/A | N/A | 22 89 7f 40 00 00 00 00 00 00 00 00 |
.H .i 15 "-18446744073709551616" † |
3b ff ff ff ff ff ff ff ff |
-264-1 | N/A | "i-18446744073709551617e" |
N/A | N/A | 22 89 7f 3f 7f 7f 7f 7f 7f 7f 7f 03 |
.H .i 15 "-18446744073709551617" † |
N/A |
-350 | N/A | "i-717897987691852588770249e" |
N/A | N/A | 22 8b 7f 59 7f 2a 60 3c 49 50 17 58 43 43 07 |
.H .i 19 "-717897987691852588770249" † |
N/A |
-3500 | N/A | "i-36360" (229) "10001e" |
N/A | N/A | 22 01 a4 7f 26 54 (109) 11 3b 03 |
.H .U f0 "-36360" (229) "10001" † |
N/A |
-35000 | N/A | "i-40389" (2376) "00001e" |
N/A | N/A | 22 0f 9f 71 79 4b (1127) 72 05 0f |
.H .I 09 53 "-40389" (2376) "00001" † |
N/A |
0.0 | N/A | N/A | ca 00 00 00 00 † |
01 (name) 00 00 00 00 00 00 00 00 |
28 00 00 00 00 † |
.d 00 00 00 00 † |
f9 00 00 † |
-0.0 | N/A | N/A | ca 80 00 00 00 † |
01 (name) 00 00 00 00 00 00 00 80 |
28 80 00 00 00 † |
.d 80 00 00 00 † |
f9 80 00 † |
1.0 | N/A | N/A | ca 3f 80 00 00 † |
01 (name) 00 00 00 00 00 00 f0 3f |
28 3f 80 00 00 † |
.d 3f 80 00 00 † |
f9 3c 00 † |
1.5 | N/A | N/A | ca 3f c0 00 00 † |
01 (name) 00 00 00 00 00 00 f8 3f |
28 3f c0 00 00 † |
.d 3f c0 00 00 † |
f9 3e 00 † |
65504.0 | N/A | N/A | ca 47 7f e0 00 † |
01 (name) 00 00 00 00 00 fc ef 40 |
28 47 7f e0 00 † |
.d 47 7f e0 00 † |
f9 7b ff † |
100000.0 | N/A | N/A | ca 47 c3 50 00 † |
01 (name) 00 00 00 00 00 6a f8 40 |
28 47 c3 50 00 † |
.d 47 c3 50 00 † |
fa 47 c3 50 00 † |
3.4028235e+38 | N/A | N/A | ca 7f 7f ff ff † |
01 (name) f8 af 4d e5 ff ff ef 47 |
28 7f 7f ff ff † |
.d 7f 7f ff ff † |
fa 7f 7f ff ff † |
-1.1 (approx.) | N/A | N/A | cb bf f1 99 99 99 99 99 9a † |
01 (name) 9a 99 99 99 99 ff f1 bf |
29 bf f1 99 99 99 99 99 9a † |
.D bf f1 99 99 99 99 99 9a † |
fb bf f1 99 99 99 99 99 9a † |
1.0e+300 (approx.) | N/A | N/A | cb 7e 37 e4 3c 88 00 75 9c |
01 (name) 9c 75 00 88 3c e4 37 7e |
29 7e 37 e4 3c 88 00 75 9c † |
.D 7e 37 e4 3c 88 00 75 9c † |
fb 7e 37 e4 3c 88 00 75 9c |
-1.1 (exact) | N/A | N/A | N/A | N/A | 22 81 82 05 01 † [6] |
.H .i 04 "-1.1" † |
N/A |
1.55000 (exact) | N/A | N/A | N/A | N/A | 22 4e 88 26 8a 57 4d 5c (2785) 5d 0e 01 † |
.H .I 16 f9 "28595" (5872) "90625" † |
N/A |
Infinity | N/A | N/A | ca 7f 80 00 00 † |
01 (name) 00 00 00 00 00 00 f0 7f |
28 7f 80 00 00 † |
N/A [8] | f9 7c 00 † |
-Infinity | N/A | N/A | ca ff 80 00 00 † |
01 (name) 00 00 00 00 00 00 f0 ff |
28 ff 80 00 00 † |
N/A [8] | f9 fc 00 † |
NaN | N/A | N/A | ca ff c0 00 00 † |
01 (name) 00 00 00 00 00 00 f8 ff |
28 ff c0 00 00 † |
N/A [8] | f9 fe 00 † |
Empty bytes [9] | "0:" [10] |
"0:" |
c4 00 † |
05 (name) 00 00 00 00 ?? [11] |
e8 00 |
.[ .$ .U .# .i 00 † |
40 |
Bytes 00 |
"1:" 00 |
"1:" 00 |
c4 01 00 † |
05 (name) 01 00 00 00 ?? 00 |
e8 01 00 00 [7] |
.[ .$ .U .# .i 01 00 † |
41 00 |
Bytes ff |
"1:" ff |
"1:" ff |
c4 01 ff † |
05 (name) 01 00 00 00 ?? ff |
e8 01 7f 01 |
.[ .$ .U .# .i 01 ff † |
41 ff |
Bytes 01 02 03 |
"3:" 01 02 03 |
"3:" 01 02 03 |
c4 03 01 02 03 † |
05 (name) 03 00 00 00 ?? 01 02 03 |
e8 03 00 40 40 03 |
.[ .$ .U .# .i 03 01 02 03 † |
43 01 02 03 |
Bytes {55}*23 |
"23:" {55}*23 |
"23:" {55}*23 |
c4 17 {55}*23 † |
05 (name) 17 00 00 00 ?? {55}*23 |
e8 17 {2a 55}*13 01 |
.[ .$ .U .# .i 17 {55}*23 † |
57 {55}*23 |
Bytes {55}*24 |
"24:" {55}*24 |
"24:" {55}*24 |
c4 18 {55}*24 † |
05 (name) 18 00 00 00 ?? {55}*24 |
e8 18 {2a 55}*13 2a 05 |
.[ .$ .U .# .i 18 {55}*24 † |
58 18 {55}*24 |
Bytes {55}*63 |
"63:" {55}*63 |
"63:" {55}*63 |
c4 3f {55}*63 † |
05 (name) 3f 00 00 00 ?? {55}*63 |
e8 3f {2a 55}*36 |
.[ .$ .U .# .i 3f {55}*63 † |
58 3f {55}*63 |
Bytes {55}*64 |
"64:" {55}*64 |
"64:" {55}*64 |
c4 40 {55}*64 † |
05 (name) 40 00 00 00 ?? {55}*64 |
e8 01 80 {2a 55}*36 2a 01 |
.[ .$ .U .# .i 40 {55}*64 † |
58 40 {55}*64 |
Bytes {55}*127 |
"127:" {55}*127 |
"127:" {55}*127 |
c4 7f {55}*127 † |
05 (name) 7f 00 00 00 ?? {55}*127 |
e8 01 bf {2a 55}*72 2a 01 |
.[ .$ .U .# .i 7f {55}*127 † |
58 7f {55}*127 |
Bytes {55}*128 |
"128:" {55}*128 |
"128:" {55}*128 |
c4 80 {55}*128 † |
05 (name) 80 00 00 00 ?? {55}*128 |
e8 02 80 {2a 55}*72 2a 55 01 |
.[ .$ .U .# .U 80 {55}*128 † |
58 80 {55}*128 |
Bytes {55}*255 |
"255:" {55}*255 |
"255:" {55}*255 |
c4 ff {55}*255 † |
05 (name) ff 00 00 00 ?? {55}*255 |
e8 03 bf {2a 55}*145 2a 05 |
.[ .$ .U .# .U ff {55}*255 † |
58 ff {55}*255 |
Bytes {55}*256 |
"256:" {55}*256 |
"256:" {55}*256 |
c5 01 00 {55}*256 † |
05 (name) 00 01 00 00 ?? {55}*256 |
e8 04 80 {2a 55}*145 2a 55 05 |
.[ .$ .U .# .I 01 00 {55}*256 † |
58 ff {55}*256 |
† Not a unique representation but representative.
Notes:
- The 4th byte of Smile's self-identification is an OR of feature bits, 1=shared property name optimization in use (a subset of shared string optimization), 2=shared string optimization in use, 4=raw binary in use.
undefined
in BSON is deprecated.- "Zigzag" encoding for signed integers: the encoded number 0, 1, 2, 3, 4, 5, ... maps to the actual number 0, -1, 1, -2, 2, -3, ....
- Variable-length (VInt) encoding for any integers: bit patterns of
(0xxxxxxx)* 10yyyyyy
decodes into(xxxxxxx)* yyyyyy
. The shortest representation is not enforced, so an arbitrary number of zero bytes can be prepended. - Vint-encoded length followed by the 8-bit-free encoding of
BigInteger#toByteArray
. [7] - Vint-encoded scale (as in
BigDecimal#scale
) followed by unscaledBigInteger
value fromBigDecimal#unscaledValue
. TheBigInteger
is encoded as in [5]. - The 8-bit-free encoding of binary bytes. Basically 7 raw bytes encode into 8 encoded bytes, so all but last bytes
aaaaaaaa bbbbbbbb cccccccc ...
are encoded as0aaaaaaa 0abbbbbb 0bbccccc ...
. When the last byte needed to be padded, it is left-padded:aaaaaaaa bbbbbbbb
encodes into0aaaaaaa 0abbbbbb 000000bb
(as opposed to0bb00000
). - JSON does not let Infinity or NaN, so does UBJSON.
- If there is no explicit distinction between binary bytes and Unicode strings, all string-like representation is assumed to be binary bytes.
- All bytes can be prepended with
.[ (bytes) .]
to give a hint on the type of binary. - The subtype byte may be used to determine the type of binary. The default is
00
.