Skip to content

Instantly share code, notes, and snippets.

@j-maas
Last active November 21, 2023 07:51
Show Gist options
  • Save j-maas/ed3d2811d808d0fa1386478575df928d to your computer and use it in GitHub Desktop.
Save j-maas/ed3d2811d808d0fa1386478575df928d to your computer and use it in GitHub Desktop.
Roc: Open and closed tag unions

Roc: Open and closed tag unions

People's intuition

As evidenced in the discussions on Zulip (1, 2, 3) there are some cases where open and closed tags behave in a way that goes against people's intuition.

The main problem seems to be that people expect produced values to compose, but the "default" syntax (without *) prohibits this. People understand the openness to mean whether all possible values are listed, not whether more values can be added "later". For example: Roc infers myValue = if True then One else Two to be an open tag union ([One, Two]*). This goes against the mental model that we can see that myValue can only be One or Two and thus people intuitively expect this to a closed union.

Furthermore, the "default" syntax defines a closed tag union (e.g. [One, Two]), so people tend to write closed tag unions most of the time. However, closed tag unions only make sense for when (and consequently function inputs). Therefore the "default" syntax tends to lead to confusing issues.

One solution to consider is to redefine the default syntax to be open (as in we omit the type variable from [One, Two]a since in our case a is empty). Then we need to handle manual type signatures for functions (when and if cannot have manual type signatures), which we can do by aligning the default syntax for function inputs to the intuition. That is, function inputs in default syntax are treated as closed unions (e.g. myClosedFunc : [One] -> Bool cannot take myClosedFunc Two, but myOpenFunc : [One]a -> Bool accepts myOpenFunc Two).

Goals of tag unions

There are two important features that tag unions afford and that we want to keep.

StoplightColor : [Red, Yellow, Green]
   
whatToDo : StoplightColor -> Str
whatToDo = \color ->
    when color is
        Red -> "Stop!"
        Green -> "Go"

In this case we forgot to handle the case where the color is Yellow and Roc will inform us about that.

Tag composition

getSettings = \filename ->
    rawData <- Task.await (File.readUtf8 filename)
    settings <-
        settingsYaml
            |> Yaml.decode settingsDecoder
            |> Task.fromResult
            |> Task.mapFail InvalidFormat

Assuming File.readUft8 can return the errors [FileNotFound, Corrupted], the inferred type is:

getSettings : Str -> [FileNotFound, Corrupted, InvalidFormat]*

Note that Roc automatically infers that in addition to the File.readUtf8 errors we could return an InvalidFormat.

Problems with closed tag unions

Restricting composition in branches

It is possible to "break" when clauses by specifying closed unions (example by Richard):

stoplightColor =
    when something is
        Positive ->
            closedRed : [ Red ]
            closedRed = Red

            closedRed

        Negative -> Yellow
        Equal -> Green

Intuitively, it seems ok to compose each branch of this where clause so that the result is stoplightColor : [Red, Yellow, Green].

But the specification of closedRed : [Red] means that the [Red] tag is not allowed to compose, and this makes the code fail to compile due to a type mismatch. This is because under the hood, each branch is treated as an open union (e.g. [Yellow] for the second branch) which can combine into the desired tag union. Overriding one branch to be a closed union prevents that branch from being combined with others.

Proposal

Since most of the time people expect tag unions to compose, and the only exception is for exhaustiveness checking, I propose to change the meaning of the "default" syntax as follows.

All produced tag unions are open by default

We do not want to go against the intuition that tag unions can be composed together. There seems to be no real use case for producing closed tag unions. To ensure tag composition, all tag unions (except for function inputs) should be open.

I propose that we achieve this by redefining the meaning of the "default" syntax [One, Two] to always mean an open tag union (except for function inputs).

one = One
# one : [One] (in current notation: [One]*)

either = if condition then One else Two
# either : [One, Two] (in current notation: [One, Two]*)

This means that users will no longer be able to annotate values as closed tag unions and the unintuitive tag restriction example will behave intuitively:

stoplightColor =
    when something is
        Positive ->
            openRed : [ Red ]  # There is actually no way to declare this to be a closed tag union, so this cannot cause a type mismatch.
            openRed = Red

            openRed

        Negative -> Yellow
        Equal -> Green

Keep if and where clauses as they are

Since if and when clauses cannot be annotated, we can simply infer their openness without needing a way to communicate that in the type. It is enough to show an error message, when the conditional is incompatible with the clause's type and point out which tags the clause does not handle.

if Invalid then One else Two

This should continue to fail with an error like:

You are using a value of type [Invalid] in this if, but if clauses only accept Bools which can be either True or False.

when Invalid is
	One -> "One"
	Two -> "Two"

This should continue to fail with an error like:

You are using a value of type [Invalid] in this where clause, but it can only handle values of type [One, Two]. Consider adding a branch to handle the Invalid tag.

if and where clauses therefore do not need to be changed.

Function inputs are closed unions by default

Functions, however, need to be aware of whether their inputs are open or closed, so that the if and where clauses they use can benefit from exhaustiveness checking.

In the following example, the function needs to ensure that open does not contain additional tags, because the caller should not need to look inside the function.

foo = \open ->
	when open is
		Foo -> "Foo"
		Bar -> "Bar"

foo Extra  # This should fail with a type mismatch, because the `when` does not handle this case.

To achieve this, functions inputs need to be either closed or open. Since most of the time the intuition is that function inputs should be closed, the default syntax of foo : [One, Two] -> Str should mean a closed union.

foo : [One, Two] -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"

foo Three  # This should fail because `[One, Two]` is treated as a closed union.

Open unions are marked with a type variable like foo : [One, Two]a -> Str.

foo : [One, Two]a -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"
		_ -> "Something else"

foo Three  # This should work.

The following should fail, since input is used in a closed when clause that cannot handle additional tags.

foo : [One, Two]a -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"

Note that inside the function, the inputs should be allowed to compose, thus be treated as open tag unions:

foo : Bool -> [One, Two] -> Str
foo = \condition input ->
	bar = if condition then input else Three
	when bar is
		One -> "One"
		Two -> "Two"
		Three -> "Three"

This way we allow closed unions only at the places where they make sense to be used, which is function inputs, if and where clauses, and can keep exhaustiveness checking.

Potential implementation problems

  1. Can the type system differentiate between function inputs (and if and where clauses) and other sites? My inutition says this should be possible, but I do not understand the unification code's usage enough to judge this.
  2. Can the type inference correctly pass along the openness from a where clause to a function's signature? It seems possible, if an open where clause marks it with a type variable, so that if a function's input is inferred to be [One, Two] (without type variable) we can treat it as a closed tag union and if it is [One, Two]a it is an open tag union. (This is a misuse of notation, but I hope the point comes across). Again, I do not have the understanding to judge whether this is problematic.

Interesting edge cases

Type variable passing

Using the variable also allows correctly communicating the type of Richard's other example in response to my initial, half-baked suggestion of hiding the *:

blah = \fooOrBar ->
    when fooOrBar is
        Foo "" -> Bar
        Bar -> Bar
        other -> other
		
# blah : [Foo Str, Bar]a -> [Bar]a

We know about certain tags, but there are additional tags which are simply passed through, as indicated by a.

Note: Does Roc actually infer that the output can also contain the tag Foo Str or does it not detect that? (I currently can't run the repl due to a compilation error.)

Typos (accidental new tags)

x =
    if foo then
        Treu
    else
        Str.isEmpty str

If we made Bool a closed union [True, False], the typo would be noticed.

However, while in this case it is convenient, limiting composition of tags seems to be unintuitive. Furthermore, the wrong tag would be discovered as soon as x is used somehwhere that expects a Bool.

It is also possible to add a special lint for Bool where a tag union that contains either True or False cannot contain any other tag. So this case could be handled without closed tags.

Surprising incompatibility (maybe)

I'm not sure if this is just an issue with the implementation or a conceptual issue with closed tag unions, but there are some cases where types are surprisingly incompatible.

foo : [ Foo Str ]*
foo = Foo "Foo"

bar : [ Bar Str ]
bar = Bar "Bar"

baz = \x ->
    when x is
        Foo y -> y
        Bar y -> y

baz bar

── TYPE MISMATCH ───────────────────────────────────────────────────────────────

The 1st argument to baz is not what I expect:

10baz bar
             ^^^

This bar value is a:

    [ Bar Str ]

But baz needs the 1st argument to be:

    [ Bar a, Foo a ]

Tip: Looks like a closed tag union does not have the Foo tag.

Tip: Closed tag unions can't grow, because that might change the size
in memory. Can you use an open tag union?
@atennapel
Copy link

Section 3.2 of https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/algeff-tr-2016-v3.pdf mentions a technique for opening/closing row-polymorphic types. I don't know if it applies to Roc, but it might be interesting for you to look in to.
When (let-)generalizing a type containing open unions you try to "close" the open unions (only the ones in positive positions I think). Then when instantiating (using) a variable you "open" the unions (in positive positions). That way the generalized types stay simple but when you use a variable you get open unions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment