Strange Loop

Held in St. Louis, Strange Loop is a conference with an eclectic line up of speakers dealing with various subjects related to software development. I felt like I learned a lot and it was a fun trip.

Teaching Programming

The opening Key note was on the subject about teaching programming. It was pointed out that programming lacks pedagogical diversity. Teaching language has various schools of thought around teaching kids how to read and write, and many studies have been done around the subject. The same for math. Teaching programming, however,lacks research, discussion and debate around the subject.

Presented was Stack Overflow survey data about how people learned to program. The vast majority were self taught from a young age. As such many people lack insight and experience into effective methods into making other people effective programmers. The prevailing point of view is to “figure it out” with little guidance.

The speaker presented some ideas, backed by studies about how we could do this better. Some of them surprising to me.

When teaching kids to read a popular school of thought is phonics, which emphasizes the phonetic sounds of words to help kids learn to read. The speaker did an experiment where she had kids read code aloud and found that it improved their ability to recognize syntax.

While many coders might like to think that programming is more than syntax, syntax is a crucial part. Our symbolic representation of ideas plays a huge role in our ability to reason about the concepts in which they represent.

As a demonstration she made a bet of a $1000 that no one in the audience would be able to reproduce some words that she would show for a second. It got my attention and was wondering how she could be so confidant making that bet with over a thousand software developers watching.

She then flashed a string of greek looking symbols very briefly. No one took her up on the challenge.

She then made another bet, this time for a $100 if someone was able to reproduce what was shown on the screen. This time it was random letters. The symbols were comprehensible, but they didn’t form any meaningful words.

Finally she repeated the exercise, but betted no money since everyone was sure to get it. She showed “Cat loves cake”. It was easy to remember despite seeing it for a brief moment.

The idea behind this exercise was to show the limitations of humans short term memory. Its only when we can reduce complicated information down to simple concepts that we can retain it, and have spare intellectual capacity to think about other things.

She then found studies about using this in an applied pedagogical contexts. The conventional wisdom to getting better at software is to write a lot of software, and to read a lot of software.

For an experiment there were three groups. One where kids would write a solution to a problem, and then be presented with an expert solution. One where kids would write their solution, then be presented with an expert solution, along with an explanation about why the expert did the things they did. The third group was only presented with an expert solution with an explanation. They didn’t actually write code themselves.

When the groups were tested, the second and third groups out performed the first group. Suggesting that it was the understanding the good solution to the problem was the key to performance, not the actual act of writing code.

I found this result to be quite surprising, but the speaker explained further. It is our ability to chunk information that allows us to free up our mental capacity to reason about our code as we write it. The explanation provided new ways to think about code, and the actual act of practice is relatively meaningless if it doesn’t lead to better thought patterns.

I thought this was very insightful, and confirms some intuitions I had about what makes good code. A talk I went to later in the second day further emphasized this theme.

As writers we must know our audience, and its not machines. Its people with an extremely limited working memory, at best we can keep track of 4-8 pieces of information, depending who we reference. If our code, in the course of reading requests our reader to keep track of more things they will need to spend much longer to reason about it. They will have to spend a long time digesting it, and trying to chunk it themselves before they are able to modify it.

A good story is as much about the information it leaves out as the information that is actually included. The speaker of the “Stories we tell with code” demonstrated this concepts with the idea about answering the question of “how did you get into tech”. In interviews she would give a stripped down answer to present a professional image. While nothing in that answer was untrue. It didn’t paint the full picture. The speaker gave a couple versions of the same answer, depending on who they would be talking to.

When writing code we want to hide our reader from all the nitty gritty details that actually need to be considered to accomplish a computing task.

The insight I took away from these two talks was that when writing code I need to be mindful that a human will be reading it who will want to get something done. People don’t read code for fun after all.

As such its important to structure our code so no superfluous information is introduced. If we are writing business rules our code should only provide information about such things. If we are writing data access layers we need to only provide the relevant details to that, etc.

We don’t want our readers feeling like they have to remembering a bunch of gibberish when they are changing our code. Try keeping “azpdbasd mzcvnq gwrtasv” memorized and carry on a detailed conversation with someone. Your brain will hurt.

Ideally we write code that will be good forever, but the problem space of any non trivial application is so vast that our code will only capture a tiny fraction about it. Successful apps will need to be modified since its impossible to capture all the nuance of a domain in a single pass. We need to be considerate of future developers who will be extending this code.

Good code is considerate of the limited working memory of humans, and will provide affordances to help chunk pieces of information. The primary way we accomplish this is through good software design, but we also need to provide context around why code is written, and be mindful about including unnecessary details around it.

Further more when we are teaching people about how to code, the most important helpful thing we can do is to provide techniques and understanding to being able to group large blocks of code at a time. They key to becoming proficient is to be able to see a large block of code and to reduce it to a single concept in our mind without dropping any unnecessary detail.

Its my opinion that code presenting only what is necessary to the job at hand is good code. Code that presents all manner of details right next to another with no ability to delineate concerns for its readers is poor code. We need to be sympathetic of our limited ability to reason about abstract information.

Zero Knowledge Proofs

I went to a couple talks covering proofs. I really enjoyed the overview of Zero Knowledge proofs. Previously I was only familiar with it from ZCash and this talk gave a historical overview of the papers that have lead to this mathematical magic that we have today.

The concept of a zero knowledge proof is that we can have a third part verify we are telling the truth, while revealing zero information about it.

In more reified terms we can use an example of crypto currency transactions. Recently there was a bitcoin transaction for over a billion USD made that made the front page of Hacker News. BitCoin was originally marketed as an anonymous currency, but this isn’t accurate. Every address’s (account’s) balance is public and every transaction is also public. This is necessary because the validators (aka miners) of transactions need to know if they are legitimate transactions before including them in the history.

Intuitivly the person confirming if an account has enough balance to make a transaction needs to know that accounts current balance as well as the amount of the transaction. In traditional financial systems your bank knows your balance, but it remains private for anyone outside the bank.

In a decentralized model, where anyone can take on the role of the bank, anyone also needs to know this information.

At best BitCoin can be described as pseudo anonymous, the addressed are opaque digests and don’t inherently leak personally identifiable information, however a great deal of work has been done studying exactly this problem.

Zero knowledge proofs provide a mechanism to get around this fundamental limitation of anoymnity of crypto currency. The premise being that with use of these mathematical techniques, which are well beyond my field of expertise, that validators can confirm each transaction is a legal transaction without needing to know the amount of the transaction.

Applications of zero knowledge proofs go beyond crypto currencies however. Data isn’t the new gold, its the new uranium. You can make money with it, but its has military uses, is becoming increasingly regulated, and you don’t want too much of it in any one place.

In our current state of affairs we have companies constantly leaking our authentication credentials, like our social insurance numbers, date of birth, and out mother’s maiden name. Furthermore regulations are being passed that will punish companies for failing to be good stewards of this data, making it become a liability.

There is a potential application of using zero knowledge proofs to limit a companies exposure by asking exactly what they need and not needing anymore information. The example in the talk was to prove you aren’t on a no fly list without having to provide who you are. Another example would be to comply with KYC regulations without handing over information that if leaked could be used to impersonate you.

This talk went into a lot of the history and introduced important figures, the techniques used have been in development since the 80s. I also learnt that there was a new protocol(?) released last week called Halo, that solves shortfalls of current implementations, like the trusted setup.

https://electriccoin.co/blog/halo-recursive-proof-composition-without-a-trusted-setup/

The speaker has a good sense of humour and was able to keep it high level enough for a topic that has quite dry details.

Formal Methods

I also went to another talk by Martin Kleepman about using Isabelle to make proofs about distributed algorithms. The example he gave was as simple of a distributed consensus algorithm as you can get, and he still ran out of time. Which goes to show the difficulty of this subject. I still found it interesting to see a demonstration of a tool that makes this as easy as possible for its user.

The algorithm consisted of two proposer nodes, and an acceptor nodes, where the proposers need to reach consensus about where to get lunch. Each proposer would need to send a request to the acceptor (Paxos terminology). The acceptor will set its state to the first request it receives, and will then respond to all subsequent proposals with that value. Each proposer, upon receiving a response will then set its state to that value.

The proof is to show that the system will reach consensus, that is to say, agree on a place to eat.

While the algorithm seems intuitively correct, proofing it in a mathematical sense is another matter. Its not easy and takes a large amount of time. Furthermore this is only verifying that the spec will perform as it says it will, verifying an implementation of it is another matter.

Isabelle had some useful features like “sorry” to allow for a theorem or lemma to be assumed as true while you are constructing the proof without needing to provide the implementation. It also provided an aptly named utility “sledgehammer” which would try its best to come up with the next line needed for the proof by brute force, leaving it to the prover to copy and paste it.

Applying formal methods to the most applications would be uneconomical, to say the least. However certain algorithms underpin large amounts of value and could have disasterous consequences were they found out to be wrong after they were deployed.

The slide presented had a continuum of verification steps, with traditional unit testing near the bottom of the hiearchy. Unit testing can catch bugs, but only those for the cases you think about. To get around this limitation people use “fuzzing” or property based testing with tools like QuickCheck or its dervitives to get a much larger range of inputs. The next step would be to use model checking tools like TLA+ (I had no idea what it was until I saw a sign for it). Formal proofs are the farthest along this continuum, and provide mathematical certainty of our algorithms.

While its not a tool I expect to use soon, I feel like a better software developer for being exposed to methods that introduce engineering discipline into the craft.

As an aside the sign for TLA+ had an example about an intern finding a bug with the memory that was to be used for the Xbox 360, and despite their lack of industry experience was able to get the proper attention to the bug which ended up saving the Christmas release date for Microsoft. If it was left until production the units would have only been able to run 4 hours at a time. I thought it was a cool story about these academic tools having tangible business impact.

Of course there is the Munchhausen trilemma that prevents us from truly knowing, but thats a question left for epistemology. Formal methods seem to provide the best we can do.

https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma

Databases

I went to a couple talks centered around databases. Wether or not data is gold or uranium is up for debate, but either way its essential for making money with software. Databases are the place where data goes, and choosing the right one for our use case can have a big effect on how much time it takes to develop solutions.

The trend I picked up on at Strange Loops was that ACID was the standard transactional databases strove for, however distributed databases from earlier in the decade seemed to have compromised these qualities in some manner or the other. Cassandra, MongoDB, Redis, DynomoDB etc. are not ACID compliant, which makes them inappropriate for some use cases.

The two I was introduced at strange loop were FaunaDB and Datomic.

Datomic seems like a great alternative to more traditional options. Datomic cloud is a managed solution provided by Cognitect that is for AWS only, there is also an on prem option as well.

Datomic represents each point of data with a tuple of 5 elements called facts. These are entity, attribute, value, and two “internal” values tx op (is current?)

Enitity is a reference to what could be thought of as a model in code, its a collection of facts. Attributes would be fields in the model, e.g. name, email. Value is the piece of data.

The tx value represents what transaction committed that value, which provides us with useful features.

The op value is a boolean that is true if the fact is still known to be true for the entity attribute pair. It will be false if a more recent fact has been added or if it has been redacted.

The tx and op values allow for Datomic to be an accumulate only DB, where will never have to overwrite or lose information. It also allows us to query what the DB looked like at a previous point in time.

To me, this is where it gives a clear benefit over traditional SQL. It gives us the ability to give a consistent reports, in SQL a SELECT clause isn’t deterministic in the sense that the data its retrieving can be changed, but in Datomic a query with a certain tx value will always give us the same result.

Another interesting feature about having a tx value is that a transaction itself is another fact. This allows us to supply annotations to transactions. If a correction needs to be made it allows the corrector to supply information like why a certain value needed to be changed which gives better audibility.

There was a talk where Datomic was used as an underling database for a Cancer immunotherapy library. Other databases could have been chosen, but Datomic seems like an exceptional fit for this use case.

Each tuple has a variety of indexes to used for querying, depending if we want to know information about an entity, or aggregate information about a particular attribute. It manages this behind the scenes so developers don’t have to worry about adding indexes.

This flexibility allows readers of the database to get the data in the “shape” that they want to. This allows for the design of readers not to be influenced by the design of the writers, which can be challenging with traditional databases for some use cases. Its nice to get the advantages of a normalized view as a writer, but the all the benefits denormalization has to offer readers.

The query language it uses is datalog, which I was familiar with because of a tutorial being on Hacker News a while back: http://www.learndatalogtoday.org/.

After going through some of the tutorial I thought it was a nice query language, but then thought nothing about it. It seemed like there wasn’t reliable support for it.

It this tutorial that made me pay attention to Datomic when I saw they had a presence at Strange Loop.

The other DB talk I went to was FaunaDB, it seems to offer many similar benefits to Datomic, like full ACID, as well as GraphQL to offer flexible queries. Its founded on the Calvin protocol. http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf

I’d love to be able to write more about the technical details, but I feel like I’d have some studying to do to really understand what’s going on. If I never went to the talk I’d feel intimidated by this stuff, but the speaker made it seem approachable.

This biggest highlight for me was using a Proportinal-integral-derivative controller (PID). It has a fancy name, but cruise control is an everyday example of one. FaunaDB uses this to control when it needs to send a second read from a node in case a node doesn’t respond in time. Always sending them will double the network load, but never sending a second read attempt will impact performance.

Using well known math its possible to dynamically tune how often you send this based on network conditions. If all reads are successful then you can increase the time to wait to issue a second read since you can be confidant the first will return. If reads aren’t returning in time the algorithm will adjust accordingly.

The speaker had on his slides Kalman filter, which looking at wikipedia sound really interesting to me. Even if there wasn’t enough time to listen about them, I at least know about them so I can learn more at my leisure.

Bayesian Inference

I went to a couple talks dealing with probability. The first one was used randomness to generate solutions for various problems. The first example was about given a projection of a mug, generate a three dimensional object that will give the right projection. The program would then generate random images, and if they gave the right projection then that generated image would be retained.

If we knew that mugs were more likely to be upright we can express that knowledge by weighting the random orientation a little bit.

The other example given was a path finding algorithm, where the program would be given a couple data points of a person’s path through a yard and would have to predict the intended destination. It would generate random data points and use a path finding algorithm. It would then compare the random data points with the given data points. If there was enough overlap it would then use the generated path as a potential solution.

Parser Parser Combinators for Program Transformation

This was one of my favourite talks. It explained what the tool: https://github.com/comby-tools/comby did.

I remember going through this article a while back https://bodil.lol/parser-combinators/. So I had some familiarity with what a parser combinator was. This tool allows for large scale refactoring of a code base that provides more precision compared to regex find and replace.

The author was able to open up 50 PRs across various languages and get 40 of them accepted, which to me is a good demonstration of the tools potential. I’d like to try some refactoring with it, I know I come across an example of errors.New(fmt.Sprintf(…)) -> fmt.Errorf(…) that he gave for example.

A more interesting example would be finding unneeded nil check with the use of range

If slice != nil { for _, elem := range slice {…} }

into

for _, elem := range slice {…}

InferenceQL

The goal of InferenceQL is to provide a SQL like interface to be able to easily use statistical techniques on a data set. The example given was based on Stack Overflow surveys. The tool was able to provide some key words in an other wise SQL looking expression to do stuff like generate missing data points in a row that would look statistically normal.

It was also able to query for the probability of something being true given all other data points, making it easy to spot outliers in data sets.

It seemed like a powerful tool that would be easy to learn if you already knew SQL, the speaker said it was still in beta. Its something I would like to try out myself.

Closing Thoughts

I really enjoyed the trip, the conference was well run, had a lot of interesting talks and there was also a lot of good food to be had at night. Definitely something I would like to go to again.

kohoeric/Strange Loop 2019 Summary.md