As evidenced in the discussions on Zulip (1, 2, 3) there are some cases where open and closed tags behave in a way that goes against people's intuition.
The main problem seems to be that people expect produced values to compose, but the "default" syntax (without *
) prohibits this. People understand the openness to mean whether all possible values are listed, not whether more values can be added "later". For example: Roc infers myValue = if True then One else Two
to be an open tag union ([One, Two]*
). This goes against the mental model that we can see that myValue
can only be One
or Two
and thus people intuitively expect this to a closed union.
Furthermore, the "default" syntax defines a closed tag union (e.g. [One, Two]
), so people tend to write closed tag unions most of the time. However, closed tag unions only make sense for when
(and consequently function inputs). Therefore the "default" syntax tends to lead to confusing issues.
One solution to consider is to redefine the default syntax to be open (as in we omit the type variable from [One, Two]a
since in our case a
is empty). Then we need to handle manual type signatures for functions (when
and if
cannot have manual type signatures), which we can do by aligning the default syntax for function inputs to the intuition. That is, function inputs in default syntax are treated as closed unions (e.g. myClosedFunc : [One] -> Bool
cannot take myClosedFunc Two
, but myOpenFunc : [One]a -> Bool
accepts myOpenFunc Two
).
There are two important features that tag unions afford and that we want to keep.
StoplightColor : [Red, Yellow, Green]
whatToDo : StoplightColor -> Str
whatToDo = \color ->
when color is
Red -> "Stop!"
Green -> "Go"
In this case we forgot to handle the case where the color is Yellow
and Roc will inform us about that.
getSettings = \filename ->
rawData <- Task.await (File.readUtf8 filename)
settings <-
settingsYaml
|> Yaml.decode settingsDecoder
|> Task.fromResult
|> Task.mapFail InvalidFormat
Assuming File.readUft8
can return the errors [FileNotFound, Corrupted]
, the inferred type is:
getSettings : Str -> [FileNotFound, Corrupted, InvalidFormat]*
Note that Roc automatically infers that in addition to the File.readUtf8
errors
we could return an InvalidFormat
.
It is possible to "break" when
clauses by specifying closed unions (example by Richard):
stoplightColor =
when something is
Positive ->
closedRed : [ Red ]
closedRed = Red
closedRed
Negative -> Yellow
Equal -> Green
Intuitively, it seems ok to compose each branch of this where
clause so that the result is stoplightColor : [Red, Yellow, Green]
.
But the specification of closedRed : [Red]
means that the [Red]
tag is not allowed to compose, and this makes the code fail to compile due to a type mismatch. This is because under the hood, each branch is treated as an open union (e.g. [Yellow]
for the second branch) which can combine into the desired tag union. Overriding one branch to be a closed union prevents that branch from being combined with others.
Since most of the time people expect tag unions to compose, and the only exception is for exhaustiveness checking, I propose to change the meaning of the "default" syntax as follows.
We do not want to go against the intuition that tag unions can be composed together. There seems to be no real use case for producing closed tag unions. To ensure tag composition, all tag unions (except for function inputs) should be open.
I propose that we achieve this by redefining the meaning of the "default" syntax [One, Two]
to always mean an open tag union (except for function inputs).
one = One
# one : [One] (in current notation: [One]*)
either = if condition then One else Two
# either : [One, Two] (in current notation: [One, Two]*)
This means that users will no longer be able to annotate values as closed tag unions and the unintuitive tag restriction example will behave intuitively:
stoplightColor =
when something is
Positive ->
openRed : [ Red ] # There is actually no way to declare this to be a closed tag union, so this cannot cause a type mismatch.
openRed = Red
openRed
Negative -> Yellow
Equal -> Green
Since if
and when
clauses cannot be annotated, we can simply infer their openness without needing a way to communicate that in the type. It is enough to show an error message, when the conditional is incompatible with the clause's type and point out which tags the clause does not handle.
if Invalid then One else Two
This should continue to fail with an error like:
You are using a value of type
[Invalid]
in this if, but if clauses only acceptBool
s which can be eitherTrue
orFalse
.
when Invalid is
One -> "One"
Two -> "Two"
This should continue to fail with an error like:
You are using a value of type
[Invalid]
in this where clause, but it can only handle values of type[One, Two]
. Consider adding a branch to handle theInvalid
tag.
if
and where
clauses therefore do not need to be changed.
Functions, however, need to be aware of whether their inputs are open or closed, so that the if
and where
clauses they use can benefit from exhaustiveness checking.
In the following example, the function needs to ensure that open
does not contain additional tags, because the caller should not need to look inside the function.
foo = \open ->
when open is
Foo -> "Foo"
Bar -> "Bar"
foo Extra # This should fail with a type mismatch, because the `when` does not handle this case.
To achieve this, functions inputs need to be either closed or open. Since most of the time the intuition is that function inputs should be closed, the default syntax of foo : [One, Two] -> Str
should mean a closed union.
foo : [One, Two] -> Str
foo = \input ->
when input is
One -> "One"
Two -> "Two"
foo Three # This should fail because `[One, Two]` is treated as a closed union.
Open unions are marked with a type variable like foo : [One, Two]a -> Str
.
foo : [One, Two]a -> Str
foo = \input ->
when input is
One -> "One"
Two -> "Two"
_ -> "Something else"
foo Three # This should work.
The following should fail, since input
is used in a closed when
clause that cannot handle additional tags.
foo : [One, Two]a -> Str
foo = \input ->
when input is
One -> "One"
Two -> "Two"
Note that inside the function, the inputs should be allowed to compose, thus be treated as open tag unions:
foo : Bool -> [One, Two] -> Str
foo = \condition input ->
bar = if condition then input else Three
when bar is
One -> "One"
Two -> "Two"
Three -> "Three"
This way we allow closed unions only at the places where they make sense to be used, which is function inputs, if
and where
clauses, and can keep exhaustiveness checking.
- Can the type system differentiate between function inputs (and
if
andwhere
clauses) and other sites? My inutition says this should be possible, but I do not understand the unification code's usage enough to judge this. - Can the type inference correctly pass along the openness from a
where
clause to a function's signature? It seems possible, if an openwhere
clause marks it with a type variable, so that if a function's input is inferred to be[One, Two]
(without type variable) we can treat it as a closed tag union and if it is[One, Two]a
it is an open tag union. (This is a misuse of notation, but I hope the point comes across). Again, I do not have the understanding to judge whether this is problematic.
Using the variable also allows correctly communicating the type of Richard's other example in response to my initial, half-baked suggestion of hiding the *
:
blah = \fooOrBar ->
when fooOrBar is
Foo "" -> Bar
Bar -> Bar
other -> other
# blah : [Foo Str, Bar]a -> [Bar]a
We know about certain tags, but there are additional tags which are simply passed through, as indicated by a
.
Note: Does Roc actually infer that the output can also contain the tag Foo Str
or does it not detect that? (I currently can't run the repl due to a compilation error.)
x =
if foo then
Treu
else
Str.isEmpty str
If we made Bool
a closed union [True, False]
, the typo would be noticed.
However, while in this case it is convenient, limiting composition of tags seems to be unintuitive. Furthermore, the wrong tag would be discovered as soon as x
is used somehwhere that expects a Bool
.
It is also possible to add a special lint for Bool
where a tag union that contains either True
or False
cannot contain any other tag. So this case could be handled without closed tags.
I'm not sure if this is just an issue with the implementation or a conceptual issue with closed tag unions, but there are some cases where types are surprisingly incompatible.
foo : [ Foo Str ]*
foo = Foo "Foo"
bar : [ Bar Str ]
bar = Bar "Bar"
baz = \x ->
when x is
Foo y -> y
Bar y -> y
baz bar
── TYPE MISMATCH ───────────────────────────────────────────────────────────────
The 1st argument to baz is not what I expect:
10│ baz bar
^^^
This bar value is a:
[ Bar Str ]
But baz needs the 1st argument to be:
[ Bar a, Foo a ]
Tip: Looks like a closed tag union does not have the Foo tag.
Tip: Closed tag unions can't grow, because that might change the size
in memory. Can you use an open tag union?
Section 3.2 of https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/algeff-tr-2016-v3.pdf mentions a technique for opening/closing row-polymorphic types. I don't know if it applies to Roc, but it might be interesting for you to look in to.
When (let-)generalizing a type containing open unions you try to "close" the open unions (only the ones in positive positions I think). Then when instantiating (using) a variable you "open" the unions (in positive positions). That way the generalized types stay simple but when you use a variable you get open unions.