Skip to content

Instantly share code, notes, and snippets.

@rikkimax
Last active September 19, 2024 17:23
Show Gist options
  • Save rikkimax/35b5673554e8d15fc3710082953989fb to your computer and use it in GitHub Desktop.
Save rikkimax/35b5673554e8d15fc3710082953989fb to your computer and use it in GitHub Desktop.

Sum Type by Struct

Field Value
DIP: (number/id -- assigned by DIP Manager)
Author: Richard (Rikki) Andrew Cattermole firstname@lastname.co.nz
Implementation: (links to implementation PR if any)
Status: Draft

Abstract

This proposal models a sumtype based upon a struct, with a focus towards acceptance of current library implementations in D and with the reflection of the past 50 years of programming language design.

Contents

Rationale

Adding sum types into the language offers three key advantages for usability.

  1. Rapid prototyping of business logic data. This can be used in conjunction with tuples to construct the data that represents web forms, in full with only one declaration required.
  2. Enabling an exception mechanism that is stack-based, rather than using a runtime unwinding library. This has the potential to be as cheap as a single move instruction to throw. Whilst guaranteeing every exception thrown is caught close to where it was thrown. Without any heap allocation.
  3. Safety, coupled with matching guarantees that a @safe cannot misuse it at compile time and in doing so corrupt a program.

Prior Work

Previously implemented library types in D, these can be split into two categories.

This divide placing a high emphasis on a simpler element type only for a library implementation can be seen in other languages too:

Language support can be divided into three categories, rather than two:

Description

A sumtype is a container that can only store one value, whose access and mutation are constrained by a union set. Protected access is provided by a tag that gives tagged union behaviour.

sumtype Identifier = int | :None | string text;

Constraints come in three forms.

  1. A type int.
  2. A member of operator :None that gets rewritten to be a type typeof(:None).
  3. A type with a name string text.

There must be at least one constraint.

If there is only one constraint, it is treated as though there was an alias this to the element. Giving implicit safe access and mutation to it.

sumtype Int = int;

Int i = 2;
i *= 5;

Assignment

Assigning one sumtype to another is done like how structs work.

You destroy the target, and then copy the source to it, after copying update the tag, copy constructor, and destructor.

sumtype S = int | bool;
S source, target;

//target.destroy;
target = source;
// target.__tag = source.__tag;
// target.__copyctor = source.__copyctor;
// target.__dtor = source.__dtor;

Assigning an element that has not been specifically requested by the user, is a little more involved. It is not enough to check against the constraint set, you must consider the sumtype variable itself.

sumtype S1 = int;
sumtype S2 = int | bool;

S1 source;
S2 target;

target = source;

The sumtype type is preferred over the elements in the constraint set. If you need to guarantee that it is setting the element rather than the sumtype variable, use a cast.

Accessing an Element

To access a value to the element of a sumtype that is a type, use a cast.

sumtype S = int | bool;
S s;

s = 5;
assert(cast(int)s == 5);

If you want to access an element by name provide it with the member of operator.

sumtype S = int i | bool;
S s;

cast(:i)s = 2;
assert(cast(:i)s == 2);

This has the same rewrite seen in the declaration of the sumtype. Where the member of operator gets rewritten to typeof(:Identifier). Giving a unique element in the set.

Storing a value into an element is always safe. It will destroy the container prior to the copy.

Getting a value from an element is not safe without matching. It is @system to access an element value.

Properties

A sumtype has the members:

  • __tag An integral tag value to indicate which element has been set.
  • __tagTypes To acquire the types in the constraint set.
  • __tagNames To acquire the name for each element in the constraint set, if not present a string in the sequence will be null.
  • __tagValues Maps an offset for an element description in the constraint set, to its tag value.

The tag value is defined to be a hash value of the tag type combined with the tag name. Of type size_t. This enables both growth and shrinking of the constraint set without requiring a full match and reassignment of the tag value. This is important for some use cases like value type exceptions.

The default tag that is selected, is the first element. In the following example :None is the default element that would be used for .init.

sumtype S = :None | int;

Type Qualifiers

When a type qualifier is applied to a sumtype, it is also applied to the constraint set.

sumtype S = int* | bool;

const(S) s;
pragma(msg, S.__tagTypes[0]); // const(int*)

Layout

A sumtype is based upon a struct, it has its copy constructor/destructor called automatically just like a struct would. It has the following variable-sized layout, which depends upon the constraint set:

  • Tag value
  • † Copy constructor function pointer
  • † Destructor function pointer
  • ♥ Tag value

† May be elided if no elements in the constraint set require it. ♥ May be elided if all elements in the constraint set have a size of zero.

If required the compiler should generate a patch function that takes the pointer to the tag value, combines it with the copy constructor/destructor and makes it into a delegate. It should however be able to reconstruct the delegate without a new function.

A copy constructor will function in place of a postblit. However, it may need a wrapper function to be generated.

The layout must be variable in size, to allow it to be the size of a single machine word that can fit in a register. This allows for other language elements like value type exceptions to hold their name-sake zero-cost capability.

If a copy constructor or destructor is present in the layout, it must be non-null, regardless of the element selected.

Unused memory that would be used for a different element is to be left in an undefined state and is not safe to access.

An optimization strategy that Walter Bright remembered during DConf 2024 is to utilize the carry flag to indicate an error occured. This allows for the tag to be elided when returning a sumtype that is non-void as long as the sumtype elements only contain member-of-operators and the return type fits inside a register.

Matching

Matching as provided by DIP1048, is wired up to understand a sumtype providing of the properties it uses. Notably one is missing __tagValue, the compiler understands where and how to acquire this reference. Therefore it should not exist.

sumtype S = int | string text | :None;

S s;
s.match {
	(:text someTextVar) {
	};

	(int) {
	};

	(:None) {
	};
};

The match patterns, now support the member of operator in the place of the type. It acts as a name lookup in the constraint set, for the given element name, or is transformed into typeof(:Identifier) and does a type match. This capability is extended to library sumtypes by finding the index in __tagNames which is a compile-time accessible sequence of strings.

Accessing (by-ref too) an element value is a @safe operation. It has no restrictions that accessing by cast does.

Lastly, a match expression supports returning a value. The returned value will be a language sumtype. Each return may have a different type, these will be included in the resulting type.

sumtype S = :None | int;
sumtype S2 = int | bool;

S1 s1;
S2 s2 = s1.match {
	(:None) {
		if (random() > 0.5)
			return true;
		else
			return 2;
	};

	(int v) {
		return v;
	};
};

Grammar

The syntax of a sumtype is based upon the predominately used description for one in a large portion of the literature.

Of note is it may templated, and when the member of operator is used to provide an element, it is rewritten into its type which unifies the syntax into one semantically represented concept.

+ SumTypeDeclaration:
+    sumtype Identifier SumTypeConstraints
+    sumtype Identifier TemplateParameters Constraint|opt SumTypeConstraints

+ SumTypeConstraints:
+    SumTypeConstraint , SumTypeConstraints
+    SumTypeConstraint

+ SumTypeConstraint:
+    MemberOfOperator
+    Type Identifier|opt

MatchPattern:
+    MemberOfOperator FunctionLiteralBody

ParameterDeclaration:
+    ParameterAttributes|opt MemberOfOperator Identifier

TypeSpecialization:
+    sumtype

Breaking Changes and Deprecations

Any symbols called sumtype may break. The likely solution is to require the keyword to only be recognized in a set edition and above.

Reference

Copyright & License

Copyright (c) 2024 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

History

The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment