perf-0.md

Since we moved today’s KS because of low attendance + the fact that it will take 2-3 separate KS meetings to fully cover the topic and a long weekend between them would suck (there’s some context to it), I’ve compiled my slides and other notes to give you something to check out if you’re extra bored/interested in the subject.

These are just bullet points, all of them (and more) will be expanded during the actual KS sessions after the weekend.

the notion of performance critical code should come up as early as first code review, but let’s not optimize prematurely
I spent a lot of time rewriting parts of Rye to first C++, then dropped the experiment for a few weeks because of TrustedMail work
I did a golang implementation as well (actually, I initially did it out of boredom on a weekend and wasn’t even expecting to finish it — but it turned out so pleasant that I did).
go has nice built-in benchmarking support, allows you to turn a regular test into a benchmark, which is NICE.
I won’t go into discussing languages here and now… that’s going to be a long discussion.
This week and the next I’m working on a final benchmark of Python vs Cython vs Rye++ vs Ryego.
Whenever we cross the language boundary, we need to create an interface - and since we have Python calling C++/Go, the interface is either using ctypes directly or wrapping the C++/Go in a custom Python module. Both have their pros and cons.
Some of our data structures are tricky to pass to a different language!
One example is a correspondences dictionary in Rye, where the keys are 2-element arrays and the values are 3-element arrays. This is of course achievable in other languages, but it’s something that feels REALLY, REALLY wrong.
So looking at other languages could actually help up clear up our way of thinking about existing code.
Another example is from wheat, where the template descriptions are complex nested dicts. That’s really unwieldy inside a statically typed language. And here’s a fun fact if you thought about using a JSON library to serialize/deserialize and access those: JSON parsing is pretty fast relative to anything else we do in Python. In a piece of go code, parsing a moderately complex JSON structure can be a bottleneck.
The moral is: once we go to a lower level, we’ll discover stuff that we never thought was slow in the first place.
Since there’s a lot of work with implementing the cross language interface/boundary, I’ve decided to look into a few RPC frameworks.
RPC frameworks give us a standardized protocol for serialization when sending data between services (protobuf for grpc… something else for Thrift)
I’ve worked with Apache Thrift - Home (originally created at Facebook) before, but Python 3 support still isn’t official. I decided to jump in to the most marketed alternative, Google’s grpc / grpc.io
There’s also Cap’n Proto: Introduction which is something worth checking out - especially since they promise good serialization performance
The serialization/deserialization process is the biggest performance point, barring network which I’m ignoring right now since I’m assuming all RPC “resource groups” will run on one machine.
Example experiment workflow for Rye:
- Take a CPU-intensive part out of Rye
- Re-write it in a different language
- Wrap the rewrite in a thin RPC layer, turning it into a service
- Replace the original calls with RPC calls
- This gives us TWO services: one is the original Python service, the other is “internal” and at first would run alongside the original service, for minimal network latency between them.
Fun links
- Thrift specification - Remote Procedure Call
These notes are a raging mess!

maligree/perf-0.md