Decode irregular JSON with Circe

The popular CI server Jenkins provides a rich API to access information about builds.
This API speaks JSON but the JSON it returns has a rather strange shape. I needed to extract the Git revision built by a job, but Jenkins hides this information in a specific object in a “catch-all” actions array which contains JSON objects of different shapes, many of which may or may not be present. With Circe and some Shapeless magic I managed to decode this irregular JSON in a type-safe and fail-safe way (ignoring unknown JSON objects).

The JSON

This actions in the JSON model of a build looks as follows. I converted the JSON to YAML to remove the syntactic boilerplate of JSON and make the snippet easier to read, and I also removed irrelevant parts:

actions:
  - parameters:
    - name: SERVICE_BUILD
      value: '2840'
    # […]
    - name: GIT_COMMIT
      value: 922cc937eb9c9142ebf0d8672a2b13f5fd28ae3e
  - causes: # …
  - {}
  - buildsByBranchName:
      refs/remotes/origin/master:
        # …
    lastBuiltRevision:
      SHA1: 922cc937eb9c9142ebf0d8672a2b13f5fd28ae3e
      branch:
      - SHA1: 922cc937eb9c9142ebf0d8672a2b13f5fd28ae3e
        name: refs/remotes/origin/master
    scmName: ''
  - {}

As you can see this JSON array contains objects of vastly different shapes (compare parameters to the Git object with buildsByBranchName and lastBuiltRevision), and even empty objects—I have no clue why these exist. I need the GIT_COMMIT parameter and the SHA1 value in lastBuiltRevision, and I’d like to ignore all these empty objects and objects I don’t need (like causes).

Decode actions into a Scala ADT

To decode this mess I went straight to Circe, the JSON library of my choice. Circe uses Decoder type-classes to describe how JSON decodes into case classes. A straight-forward decoder for the inner lastBuiltRevision object looks as follows:

import io.circe._

final case class LastBuiltRevision(sha1: String)
object LastBuiltRevision {
  implicit val lastBuiltRevisionDecoder
    : Decoder[LastBuiltRevision] =
    Decoder.forProduct1("SHA1")(LastBuiltRevision(_))
}

For the actions array I define an ADT that describes all known actions:

sealed trait Action

object Action {
  final case class Git(lastBuiltRevision: LastBuiltRevision)
      extends Action
  object Git {
    implicit val gitDecoder: Decoder[Git] =
      Decoder.forProduct1("lastBuiltRevision")(Git(_))
  }

  final case class Parameter(name: String, value: String)
  object Parameter {
    implicit val parameterDecoder: Decoder[Parameter] =
      Decoder.forProduct2("name", "value")(Parameter(_, _))
  }

  final case class Parameters(parameters: List[Parameter])
      extends Action
  object Parameters {
    implicit val parametersDecoder: Decoder[Parameters] =
      Decoder.forProduct1("parameters")(Parameters(_))
  }

  implicit val actionDecoder: Decoder[Action] = {
    import cats.syntax.functor._
    Decoder[Parameters].widen
      .or(Decoder[Git].widen)
  }
}

The Git and Parameters case classes with their straight-forward Decoder instances describe the corresponding objects. My Decoder[Action] then tries both decoders, either failing if both decoders fail, or returning whatever the first decoder decodes. I need to widen the decoders to Decoder[Action] explicitly because Decoder is invariant.

Skip over unknown actions

To skip over unknown actions like causes I introduce another ADT to represent either a known and decoded action, or an unknown action with its JSON:

sealed trait MaybeAction
object MaybeAction {
  final case class Known(action: Action) extends MaybeAction
  final case class Unknown(contents: Json)
      extends MaybeAction

  implicit val maybeActionDecoder: Decoder[MaybeAction] =
    Decoder[Action]
      .map(Known)
      .or(Decoder[Json].map(Unknown))
}

The Decoder instance of MaybeAction tries to decode an Action and maps the result to a Known action. If this fails it wraps the JSON into an Unknown action instead.

Decode a build

Now I can decode the entire build into a Build case class that holds a list of MaybeAction values:

final case class Build(actions: List[MaybeAction]) {
  def knownActions: List[Action] = actions.collect {
    case MaybeAction.Known(action) => action
  }
}
object Build {
  implicit val buildDecoder: Decoder[Build] =
    Decoder.forProduct1("actions")(Build(_))
}

The Decoder decodes the actions field as List[MaybeAction] and wraps it into a Build value. The Build case class offers a knownActions function to collect all known actions. To get the build revision I can now decode a Build from a response String and use collectFirst to extract the Git action which contains the SHA1 I’m looking for:

import io.circe.parser.decode

def main(args: Array[String]): Unit = {
    val sha1 = for {
      build <- decode[Build](
        Source
          .fromResource("jenkins-response.json")
          .mkString)
      revision <- build.knownActions
        .collectFirst { case Action.Git(rev) => rev }
        .toRight(
          DecodingFailure(
            s"No Git information found in $build",
            List.empty))
    } yield revision.sha1

    println(s"Revision SHA1: $sha1")
  }

I take advantage of right-biased Either (Scala 2.12 and later) here; sha1 either holds a Left[DecodingFailure] or a Right[String] with the desired hash after the for-comprehension.

Generically decode actions

Let’s recap the Action decoder again:

 implicit val actionDecoder: Decoder[Action] =
  Decoder[Parameters]
    .asInstanceOf[Decoder[Action]]
    .or(Decoder[Git]
      .asInstanceOf[Decoder[Action]])

It explicitly lists all variants; whenver I add a new action I have to extend the Decoder as well. Thanks to the coproduct nature of a sealed traits I can also write a generic Decoder with Shapeless that automatically decodes all variants of the Action trait—hence a sealed trait which makes sure that the compiler knows about all variants at compile time.

The shapeless.Coproduct corresponding to my Action trait has the following type:

Action.Git :+: Action.Parameters :+: shapeless.CNil

This type reads as ”either an Action.Git or an Action.Parameters” The CNil tail serves as recursion anchor when iterating over co-products on the type level; coproducts can never have this value at runtime. A value of this type, eg, an Action.Parameters value, looks as follows:

Inr(Inl(Parameters(List(…))))

Inr reads as “skip this position”, and Inl means “this position has a value”. I can now inductively define a Decoder instance for a coproduct represention of Action:

private implicit val cnilDecoder: Decoder[CNil] =
  Decoder.failed(DecodingFailure("CNil", List.empty))

private implicit def cconsActionDecoder[H <: Action, T <: Coproduct](
    implicit decodeH: Decoder[H],
    decodeT: Decoder[T]
): Decoder[H :+: T] =
  decodeH.map(Inl[H, T]).or(decodeT.map(Inr[H, T]))

As said, CNil never occurs at runtime and only serves as the base case for inductive definitions like this, so the CNil case above is unreachable, but I need to define it anyway to make recursive implicit resolution terminate.

Let’s look at the more interesting ccons or :+: case: It summons two Decoder instances, one for any H of type Action, and another for any T of type Coproduct—here I recursively move through the coproduct until implicit resolution ends up at the CNil case. With these two instances I can define a Decoder for the H :+: T case, ie, the T co-product with a H in front of it. The Decoder does what the explicit decoder did as well: It tries to decode the T action, and falls back to decode other actions through the recursive Decoder instance for T. In either case I need to lift the result to a Coproduct. If I can decode a T I lift it with Inl to say that I have a value for this position in the coproduct, whereas if I fall back to T I put an Inr around to say that I need to skip this position and move on to the tail.

I define both implicits as private to the Action companion object to not leak them to other code which might wreck havoc of implicit resolution—after all these instances are quite generic. I can then use these new instances to define the actual Decoder[Action] instance:

private def genericActionDecoder[Repr <: Coproduct](
    implicit genericAction: Generic.Aux[Action, Repr],
    decodeRepr: Decoder[Repr]
): Decoder[Action] = decodeRepr.map(genericAction.from)

implicit val actionDecoder: Decoder[Action] =
  genericActionDecoder

The genericActionDecoder takes the Generic instance for Action. Thanks to Scala’s path-dependent types I do not need explicitly specific the Coproduct shape of Action—instead I introduce a generic type parameter Repr and use it to refer to the coproduct representation of Action. Implicit resolution then does the rest and figures out what concrete type applies here. I also need a Decoder instance for the represention which comes from the coproduct implicits I defined before. This implicit again is private; I use it only once to summon the Decoder[Action] once. This avoids expensive derivation of Action whenever I need a Decoder[Action] instance, and avoids leaking generic implicits into other scopes.

With this generic definition I can now extend Action with further variants whenever I need to decode more of the actions array, and I just need to write a Decoder for the new variant. Shapeless then does the rest and gives me a complete Decoder for all action variants.

swsnr/Decode irregular JSON with Circe.md

The JSON

Decode actions into a Scala ADT

Skip over unknown actions

Decode a build

Generically decode actions