Skip to content

Instantly share code, notes, and snippets.

@kubukoz
Created October 5, 2021 17:36
Show Gist options
  • Save kubukoz/50eb3f3bd41b20b4e464c0b6dbc0f7b0 to your computer and use it in GitHub Desktop.
Save kubukoz/50eb3f3bd41b20b4e464c0b6dbc0f7b0 to your computer and use it in GitHub Desktop.
Semiauto vs auto derivation

You mention that you would only use circe auto derivation in only a few limited cases - may I ask why? It looks like you are using semi-auto derivation instead. Why is this better/more stable?

The difference is where the derivation happens - in semiauto, you only have one place where a codec for a given type will be derived: at definition site, which is usually the companion object (it's the companion object in our case, but it could be anywhere really - point still stands that it's easier to track). In auto derivation, codecs are derived as needed by the usage - e.g. in the endpoint definitions or in a controller (depending on the tech stack you use). The problem here is that you need all the codecs of all the fields and all their fields (...and so on...) to be available there.

The centralization of the codecs in Semiauto mode allows us to switch to a different format at any time, while keeping the representation consistent across all usages of it. Additionally, it allows us to test the codec logic without involving the controller/endpoints (I want to test just the circe codec, don't force me to test my tapir endpoints - you get the idea) - this allows us to test precisely what we want to test, without extra dependencies like the routing logic.

Another point is that auto derivation is more susceptible to added/removed imports - implicit scope can be affected by that, and it becomes really easy to accidentally remove an import that's part of an implicit chain - especially in editors that don't rely on output from the compiler. Then again, even the Scala compiler can generate spurious "unused implicit" warnings when they're used in macros (I've seen this countless times with the Chimney library), leading to even well-written Scalafix rules removing vital imports and thus breaking the format unexpectedly.

Even things like golden testing (https://github.com/circe/circe-golden) are more difficult to do when you go full auto, because you can't rely on the same set of imports being used at the use site and the test site. This can partially be mitigated by another pattern that you could follow, which I'd say is a mix "semi-full auto?" - you do auto derivation but in limited scope, and expose the result as a standalone implicit (no extra implicit imports required):

implicit val thisIsSemiauto: Codec[MyClass] = {
  import io.circe.generic.auto._
  deriveCodec[MyClass] //the codecs for the fields will be generated
}

this works for some things (e.g. I used it in pureconfig for a while, until I wanted to customize the reader for some nested field...) and allows you to scrap some boilerplate, while keeping some degree of safety/sanity (you can limit your auto derivation to only the top-level types that you expose in your APIs, and skip the implicit vals for their fields). I wouldn't recommend this though, just write the semiauto instances and never look at them again (it's still not as manual as Codec.forProduct3 or hand-written codecs reading all the fields one by one).

Getting to the end, we have compilation speed, which suffers greatly with auto derivation of nested structures with common sub-trees: whenever a type is used in more than one codec, its instance has to be generated N times! You completely avoid this with semiauto derivation, and also you limit the macro expansions to - usually - a single file with the data types, and not - say - your controller code. That means you can change your controller and get compiler/test feedback quickly, or at least quicker than if the derivation was happening in there.

Finally: in Scala 3, semiauto is so cheap (derives Foo) that it pretty much has no downsides when compared to auto. Of course this is only possible if you keep the types and the codecs together, but in a "hexagonal architecture" codebase you'd separate out the DTOs and the domain models anyway, so it's not a problem that the DTOs and codecs are in one file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment