Skip to content

Instantly share code, notes, and snippets.

@quant61
Last active February 3, 2018 16:01
Show Gist options
  • Save quant61/afcf0e0f150a491aeb12ecdc042030cb to your computer and use it in GitHub Desktop.
Save quant61/afcf0e0f150a491aeb12ecdc042030cb to your computer and use it in GitHub Desktop.
msgpack extensions

some proposals of msgpack extensions

some general notes

msgpack ext value has following structure:

+--------+--------+--------+========+
|  extX  |  SIZE  |  type  |  data  |
+--------+--------+--------+========+

where:
* extX is sizing type (0xc7 + log2(size_bytes) )
* SIZE is int(8 * 2^sizingType) value having size of data in bytes
* type is typeId
  • in listed proposals msgpack ext header is omitted(it takes 3/4/6 bytes depending on size)
  • most of proposals have their own additional headers

typed array extension:

+--------+-------+-------+
|header  | size  | data  |
+--------+-------+-------+
  • header takes 1 byte
  • size takes 2^sizingType bytes and represents N
  • data takes N * elementSize bytes

header structure:

bit   7  - reserved
bit   6  - endian
bits 2-5 - type
bits 0-1 - sizing type

sizing types:

00 - small , size takes 1 byte , up to 255 elements
01 - medium, size takes 2 bytes, up to 65535 elements
10 - big   , size takes 4 bytes, up to 2^32-1 elements
11 - huge  , size takes 8 bytes, up to 2^64-1 elements
huge sizing type has only sense for bits array, because msgpack ext doesn't support values longer than 2^32-1 bytes

value types:

integers:
(u)int8  - 000Z
(u)int16 - 001Z
(u)int32 - 010Z
(u)int64 - 011Z
if Z is 1 values are signed, otherwise unsigned

float32  - 1000
float64  - 1001
bits       - 1010
...5 possible types remaining

table extension

if msgpack is like json(actually, it's similar to phpserialize) msgpack table is like csv

structure

+--------+-------+-------+-------+
|header  | rows  | cols  | cells |
+--------+-------+-------+-------+

header

bits 7-4: version(0000) - for future use, these bits are reserved for additional extensions
bit 3: has header - if true table will have additional row which represents columns names
bit 2: has primary key - if true first column will be treated as identifier
bits 0-1 sizing type
  • for future extensions structure can differ

rows and cols

tiny/small table(00) - rows and cols are in the same byte
  4 high bits represent number of rows, 4 low bits represent number of cols, up to 15 rows and cols
small/standard table(01) rows and cols take by 1 byte(up to 255 rows and cols)
standard/big table(10) rows and cols take by 1 byte(up to 65535 rows and cols)
big/huge table(11) rows and cols take by 1 byte(up to 2^32-1 rows and cols)

then goes (ROWS + HAS_HEADER) * COLS cells

every cell is msgpack value as in msgpack arrays and maps

if has_header bit is set, additional row is inserted

##comparison

  • msgpack table is good for array of typed objects having the same keys

overhead:

array of objects:

  • header: 1(up to 15), 2(up to 255), 3(up to 65535), 5(up to 2^32-1 objects)
  • on every object: +1 byte if <16 fields, +2 bytes if <256 fields...

table:

  • +(3/5/6) bytes depending on data size
  • +1 byte table header
  • +(1/2/4/8) bytes depending on max(rows, cols) - actually number of rows is used

so, table header is bigger(+4/5 bytes for small table), +10 bytes for big table

but you don't need to duplicate properties names

numeric extensions:

Big integer

+-------------+-------+-------+
| ext header  | size  | data  |
+-------------+-------+-------+

data is decoded as n-bit integer

Complex numbers

some languages like Python or Go support complex numbers out of box. Why not to do the same in msgpack?

complex number is divided by 2 types with different type ids:

static complex

real and imaginary parts takes half of data and decoded in the same way:

  • 1 byte(tiny) - high bits represent real part, low bits - imaginary part.
  • 2 bytes - as 2 int8 values
  • 4 bytes - as 2 int16
  • 8 bytes - as 2 float32
  • 16 bytes - as 2 float64

example: d5 XX 2a 0d means 42+13i

dynamic complex

data is interpreted as msgpack values

note: msgpack values could by extensions(adds support for bigints, for example)

Fractions:

there are 3 types with different type ids:

  • positive static fraction - first half of data is numerator, second half is denominator

  • negative static fraction - the same as previous, but sing is negative

  • dynamic fraction - 2 msgpack values - first one is numerator, second is denominator

  • note: some fractions take 3 bytes:

3/2 could by written as

+--------+--------+--------+
|  0xd4  |  type  |  0x32  |
+--------+--------+--------+
  • note2: in dynamic fraction one or both values could be msgpack extensions
  • note3: Fractions can also represent some special values(only 3 bytes are enough):
  • ∞ Infinity: denuminator is zero: positive and negative fractions represent positive and negative Infinity
+--------+--------+--------+
|  0xd4  |  type  |  0xA0  | - A could by replaced with anything but zero
+--------+--------+--------+
  • NaN: both numerator and denominator are zeroes:
+--------+--------+--------+
|  0xd4  |  type  |  0x00  |
+--------+--------+--------+
  • -0: negative static ratio where numerator is zero, but denominator isn't
+--------+--------+--------+
|  0xd4  |  type  |  0x0A  |
+--------+--------+--------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment