Skip to content

Instantly share code, notes, and snippets.

@ayazhafiz
Forked from j-maas/README.md
Last active April 4, 2022 17:03
Show Gist options
  • Save ayazhafiz/0aa058ba63d349f9a76479829f81caf1 to your computer and use it in GitHub Desktop.
Save ayazhafiz/0aa058ba63d349f9a76479829f81caf1 to your computer and use it in GitHub Desktop.
Roc: Open and closed tag unions

Roc: Open and closed tag unions

People's intuition

As evidenced in the discussions on Zulip (1, 2, 3) there are some cases where open and closed tags behave in a way that goes against people's intuition.

The main problem seems to be that people expect produced values to compose, but the "default" syntax (without *) prohibits this. People understand the openness to mean whether all possible values are listed, not whether more values can be added "later". For example: Roc infers myValue = if True then One else Two to be an open tag union ([One, Two]*). This goes against the mental model that we can see that myValue can only be One or Two and thus people intuitively expect this to a closed union.

Furthermore, the "default" syntax defines a closed tag union (e.g. [One, Two]), so people tend to write closed tag unions most of the time. However, closed tag unions only make sense for when (and consequently function inputs). Therefore the "default" syntax tends to lead to confusing issues.

(Ayaz) I'm not sure if this makes sense to add here, but: the goal of open tag unions is that they can be used in contexts that have types larger than the closed union counterpart. This was an insight Jan had recently, and I think they explained it very well:

Result something [ OutOfBounds ]* doesn't mean that any error (including OutOfBounds) is possible there - instead, it means that the only possible error there is OutOfBounds, and that that error can be passed on to functions that handle more errors than just OutOfBounds. The * means that these possibilities are allowed to be used in the context of broader possibility spaces. So, the goal is that an open tag union allows your type to be instantiated in the context of a larger type, but the actual variants in the open tag union type (i.e. A, B in [A, B]*) are in fact the only variants that can be in a value that has that type. My value of [A, B]* can be used in a position that wants [A, B, C] or something else, but I know that value can only have exactly A or B.

Since it seems that it's unpleasant for folks to understand the behavior of tag unions in this way we should definitely do whatever we can to improve that. For a few technical and expressiveness reasons this is the "best model" we have for right now.

One solution to consider is to redefine the default syntax to be open (as in we omit the type variable from [One, Two]a since in our case a is empty). Then we need to handle manual type signatures for functions (when and if cannot have manual type signatures), which we can do by aligning the default syntax for function inputs to the intuition. That is, function inputs in default syntax are treated as closed unions (e.g. myClosedFunc : [One] -> Bool cannot take myClosedFunc Two, but myOpenFunc : [One]a -> Bool accepts myOpenFunc Two).

Goals of tag unions

There are two important features that tag unions afford and that we want to keep.

getSettings = \filename ->
    rawData <- Task.await (File.readUtf8 filename)
    settings <-
        settingsYaml
            |> Yaml.decode settingsDecoder
            |> Task.fromResult
            |> Task.mapFail InvalidFormat

Assuming File.readUft8 can return the errors [FileNotFound, Corrupted], the inferred type is:

getSettings : Str -> [FileNotFound, Corrupted, InvalidFormat]*

Note that Roc automatically infers that in addition to the File.readUtf8 errors we could return an InvalidFormat.

(Ayaz) could you elaborate on how exhaustiveness checking plays a role here? Or did you mean to flip this heading with the "Tag composition" one below? :)

Tag composition

StoplightColor : [Red, Yellow, Green]
   
whatToDo : StoplightColor -> Str
whatToDo = \color ->
    when color is
        Red -> "Stop!"
        Green -> "Go"

In this case we forgot to handle the case where the color is Yellow and Roc will inform us about that.

(Ayaz) importantly, this works independently of open/closed tag unions

Problems with closed tag unions

Restricting composition in branches

It is possible to "break" when clauses by specifying closed unions (example by Richard):

stoplightColor =
    when something is
        Positive ->
            closedRed : [ Red ]
            closedRed = Red

            closedRed

        Negative -> Yellow
        Equal -> Green

Intuitively, it seems ok to compose each branch of this where clause so that the result is stoplightColor : [Red, Yellow, Green].

But the specification of closedRed : [Red] means that the [Red] tag is not allowed to compose, and this makes the code fail to compile due to a type mismatch. This is because under the hood, each branch is treated as an open union (e.g. [Yellow] for the second branch) which can combine into the desired tag union. Overriding one branch to be a closed union prevents that branch from being combined with others.

Proposal

Since most of the time people expect tag unions to compose, and the only exception is for exhaustiveness checking, I propose to change the meaning of the "default" syntax as follows.

All produced tag unions are open by default

We do not want to go against the intuition that tag unions can be composed together. There seems to be no real use case for producing closed tag unions. To ensure tag composition, all tag unions (except for function inputs) should be open.

I propose that we achieve this by redefining the meaning of the "default" syntax [One, Two] to always mean an open tag union (except for function inputs).

one = One
# one : [One] (in current notation: [One]*)

either = if condition then One else Two
# either : [One, Two] (in current notation: [One, Two]*)

This means that users will no longer be able to annotate values as closed tag unions and the unintuitive tag restriction example will behave intuitively:

stoplightColor =
    when something is
        Positive ->
            openRed : [ Red ]  # There is actually no way to declare this to be a closed tag union, so this cannot cause a type mismatch.
            openRed = Red

            openRed

        Negative -> Yellow
        Equal -> Green

Keep if and where clauses as they are

Since if and when clauses cannot be annotated, we can simply infer their openness without needing a way to communicate that in the type. It is enough to show an error message, when the conditional is incompatible with the clause's type and point out which tags the clause does not handle.

if Invalid then One else Two

This should continue to fail with an error like:

You are using a value of type [Invalid] in this if, but if clauses only accept Bools which can be either True or False.

when Invalid is
	One -> "One"
	Two -> "Two"

This should continue to fail with an error like:

You are using a value of type [Invalid] in this where clause, but it can only handle values of type [One, Two]. Consider adding a branch to handle the Invalid tag.

if and where clauses therefore do not need to be changed.

Function inputs are closed unions by default

Functions, however, need to be aware of whether their inputs are open or closed, so that the if and where clauses they use can benefit from exhaustiveness checking.

In the following example, the function needs to ensure that open does not contain additional tags, because the caller should not need to look inside the function.

foo = \open ->
	when open is
		Foo -> "Foo"
		Bar -> "Bar"

foo Extra  # This should fail with a type mismatch, because the `when` does not handle this case.

To achieve this, functions inputs need to be either closed or open. Since most of the time the intuition is that function inputs should be closed, the default syntax of foo : [One, Two] -> Str should mean a closed union.

foo : [One, Two] -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"

foo Three  # This should fail because `[One, Two]` is treated as a closed union.

Open unions are marked with a type variable like foo : [One, Two]a -> Str.

foo : [One, Two]a -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"
		_ -> "Something else"

foo Three  # This should work.

The following should fail, since input is used in a closed when clause that cannot handle additional tags.

foo : [One, Two]a -> Str
foo = \input ->
	when input is
		One -> "One"
		Two -> "Two"

Note that inside the function, the inputs should be allowed to compose, thus be treated as open tag unions:

foo : Bool -> [One, Two] -> Str
foo = \condition input ->
	bar = if condition then input else Three
	when bar is
		One -> "One"
		Two -> "Two"
		Three -> "Three"

This way we allow closed unions only at the places where they make sense to be used, which is function inputs, if and where clauses, and can keep exhaustiveness checking.

(Ayaz) If I understand correctly, everything in this section is the way Roc works today - is that correct? It seems the biggest implication here is that [A, B] means a closed tag union in a function parameter position, but means an open union (e.g. [A, B]a) anywhere a value is produced. I think is a bit syntactically confusing. I don't think it's a good idea to overload the open/closed syntax depending on the position a value is in (that is, whether it is in an input position or value-output position)

Potential implementation problems

  1. Can the type system differentiate between function inputs (and if and when clauses) and other sites? My inutition says this should be possible, but I do not understand the unification code's usage enough to judge this.

(Ayaz) Yes, this can be done (it already is actually, it's the reason why in a when clause closed tag unions are inferred)

  1. Can the type inference correctly pass along the openness from a where clause to a function's signature? It seems possible, if an open where clause marks it with a type variable, so that if a function's input is inferred to be [One, Two] (without type variable) we can treat it as a closed tag union and if it is [One, Two]a it is an open tag union. (This is a misuse of notation, but I hope the point comes across). Again, I do not have the understanding to judge whether this is problematic.

(Ayaz) Yes, but here is where having open tag unions be marked as [A, B] (without the extension variable) would be confusing from a readability perspective.

Interesting edge cases

Type variable passing

Using the variable also allows correctly communicating the type of Richard's other example in response to my initial, half-baked suggestion of hiding the *:

blah = \fooOrBar ->
    when fooOrBar is
        Foo "" -> Bar
        Bar -> Bar
        other -> other
		
# blah : [Foo Str, Bar]a -> [Bar]a

We know about certain tags, but there are additional tags which are simply passed through, as indicated by a.

Note: Does Roc actually infer that the output can also contain the tag Foo Str or does it not detect that? (I currently can't run the repl due to a compilation error.)

(Ayaz): Yes, Roc would currently infer that the output also contains Foo Str

Typos (accidental new tags)

x =
    if foo then
        Treu
    else
        Str.isEmpty str

If we made Bool a closed union [True, False], the typo would be noticed.

However, while in this case it is convenient, limiting composition of tags seems to be unintuitive. Furthermore, the wrong tag would be discovered as soon as x is used somehwhere that expects a Bool.

It is also possible to add a special lint for Bool where a tag union that contains either True or False cannot contain any other tag. So this case could be handled without closed tags.

(Ayaz): Actually, this would already be caught today. The first branch produces [Treu]a, and the second produces Bool : [True, False]. [Treu]a and [True, False] don't unify, so you get a type error.

Surprising incompatibility (maybe)

I'm not sure if this is just an issue with the implementation or a conceptual issue with closed tag unions, but there are some cases where types are surprisingly incompatible.

foo : [ Foo Str ]*
foo = Foo "Foo"

bar : [ Bar Str ]
bar = Bar "Bar"

baz = \x ->
    when x is
        Foo y -> y
        Bar y -> y

baz bar

── TYPE MISMATCH ───────────────────────────────────────────────────────────────

The 1st argument to baz is not what I expect:

10baz bar
             ^^^

This bar value is a:

    [ Bar Str ]

But baz needs the 1st argument to be:

    [ Bar a, Foo a ]

Tip: Looks like a closed tag union does not have the Foo tag.

Tip: Closed tag unions can't grow, because that might change the size
in memory. Can you use an open tag union?

(Ayaz) yeah, so the problem here is that because bar has been declared closed, it can only be used with the specific memory layout/type corresponding to [ Bar Str ]. By making it an open tag union, bar would become polymorphic, and if you needed to use it in a [ Bar Str, Foo Str ] position (as you do for the call to baz) you could because we would specialize it to "match the type" of [ Bar Str, Foo Str ] at that usage site. This goes back to the first note I added in this document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment