For info, I gave a talk about it : http://www.slideshare.net/jwinandy/data-encoding-and-metadata-for-streams/17
a few points :
-
a reference to a schema is 64 bits (with hashing) or 32 bits if you use a coordination store (like Kafka + Camus does).
- It's not a real waste of space, because you can use this reference for multiple payloads.
-
field renaming is well supported. In Avro you read your data with not one, but 2 schemas :
- the one that was used to encode the data with (easy, it's around the data as metadata),
- and the one you want to use to read your data.
So you can have a common read schema (thanks to union and renaming) for several write schemas.
-
One of the great feature of Avro is the genericity. You don't have to generate code to parse a message, so you can build an smart intermediary, like smart hadoop jobs that do generic stuffs : https://github.com/viadeo/viadeo-avro-utils