Skip to content

Instantly share code, notes, and snippets.

@ugorji
Last active December 14, 2015 11:09
Show Gist options
  • Save ugorji/5077089 to your computer and use it in GitHub Desktop.
Save ugorji/5077089 to your computer and use it in GitHub Desktop.
msgpack-new-spec-ideas

My thinking is summarized as :

There is old data already stored with the Raw ambiguity. In lieu of having all old data be re-written, the solution is to keep that ambiguity and add new explicit types.

Any old code that comes in touch with newly serialized data will break, and will have to update their libraries.

However, the old serialized data will not need to change or have its interpretation change, regardless of assumption made on how to handle Raw previously.

Beyond that, we add new native type for Timestamp and PrivateExtensions.

My suggestion is loosely defined as:

Raw: RawRepresentation (including new Raw8 Type)

String: StringType RawRepresentation

Binary: BinaryType RawRepresentation

Timestamp: TimestampType seconds_since_epoch_as_int_or_float

Timestamp: TimestampType [ seconds_since_epoch_as_int, nanoseconds_as_int ]

Timestamp: TimestampType [ seconds_since_epoch_as_int, nanoseconds_as_int, timezone ]

PrivateExtension: PrivateExtensionType Tag(Byte) ValueAsRegularMsgpackEncodedValue

([ ... ] means array)

We know how arrays, ints, floats, Raw are currently represented in msgpack. All the new "types" just just piggy-backs on those i.e.

ExplicitString is one byte (e.g. 0xd4) + representation of Raw

ExplicitBinary is one byte (e.g. 0xd5) + Representation of Raw

Timestamp is one byte (e.g. 0xd6) + representation of a integer, float or array containing 2 or 3 elements

PrivateExtension is one byte (e.g. 0xd7) + one tag byte (e.g. 0x01 representing Point) + representation of the value (e.g. a Point represented as an array of 2 integers in regular msgpack encoding)

Updated Serializers use boolean options that configure how to operate. The default can be legacy mode. This might be a determination based on how long library has been in use. If all of these options are false, then it is legacy mode and serializers (even updated ones) keep on encoding msgpack as before.

UseStringType

UseBinaryType

UseTimestampType

UsePrivateExtensionType

It summarized in sentences as:

In msgpack, the Raw type is currently ambiguous and used to represent both strings and binary. The new spec introduces support for explicit string and binary types to resolve that ambiguity. In addition, the new spec introduces a timestamp type to allow transmission of timestamps in an interoperable manner. To accomodate private extensions, an extension type is also introduced.

With these changes, backward compatibility is preserved with the caveat that deserializers/receivers update their libraries to decode new types. Serializers/senders do not have to update their library since the legacy msgpack format is still valid.

Updated libraries are listed below: .....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment