Skip to content

Instantly share code, notes, and snippets.

@rikkimax
Last active September 18, 2024 16:31
Show Gist options
  • Save rikkimax/18563106f610a2daa069c635079a955c to your computer and use it in GitHub Desktop.
Save rikkimax/18563106f610a2daa069c635079a955c to your computer and use it in GitHub Desktop.

Framework for Escape Set

Field Value
DIP: (number/id -- assigned by DIP Manager)
Author: Richard (Rikki) Andrew Cattermole firstname@lastname.co.nz
Implementation: (links to implementation PR if any)
Status: Draft

Abstract

The introduction of an escape set modelling at a function signature offers the ability to set better defaults for relationship strengths. A redesign of the escape set analysis from that of DIP1000, allows the escape set to grow and shrink during a function body enabling more code to work.

Contents

Rationale

The purpose of this proposal is to introduce a framework to do memory safety analysis in the compiler, with enough optional information specifiable to allow the description to scale with need.

Memory passed into a function can go into any of these four places. Being able to trace this offers accurate borrowing safety for owning representation.

  1. Into the unknown
  2. Another parameter
  3. As the return
  4. Into the this pointer

The movement of memory from an input to an output is not all equal. Some establish a strong relationship where the output depends upon the input to be valid. Others say the input contributed to the output, and it depends upon what the caller thinks. These relationship strengths are offered using modifiers. The ones specified here, are not required to be implemented unless analysis exists to take advantage of them. They can represent DIP1000 behaviour, or other escape analysis proposals due to the flattening at the function signature being fairly limited.

Describing just the escape set, and then the relationship with good defaults allows for removing the DIP1000 attribute mess. An example of this is inRefOutRef which only needs the escape set to be annotated without any modifiers, but intRefOutPtr would need both the escape set and modifiers annotated.

ref int* inRefOutRef(@escape(return/*&*/) ref int* input) => input;
int** intRefOutPtr(@escape(return&) ref int* input) => &input;

With DIP1000 to do either of these function prototypes you would use the return ref + scope attributes on the parameter. Instead, these are two separate attributes return + ref with an invalid combination of return and scope as return has a larger escape set than scope.

Prior Work

Existing analysis in the form of DIP1000 offers both escape analysis and owner escape analysis which is intended for memory owned by a point in the stack.

The attributes that DIP1000 describes in its model are the following:

DIP1000 Input-Output Relationship
scope No Return †
return See return scope and return ref
return scope Returns ‡♦
return ref Returns ‡, ref
return ref scope Returns ‡♦, ref

† Cannot include other escapes

‡ May include other escapes, minimum escape set

♦ Escapes must be modellable and not globals or throws

♥ The by-ref value is what is being protected

It uses three keywords to offer five different combinations with only four unique relationships between a given input and its outputs. Of note is that none of the relationships described include the value stored within a by-ref parameter, only the by-ref pointer. Of one return it can be used to denote either return scope or return ref depending on context.

These attributes have led to significant confusion in the usage of DIP1000, and do not model heap memory to a usable level, which has resulted in abandonment and usage of @trusted where it should not have been @trusted.

Description

This proposal introduces the new escape set with configurable modifiers per input-output relationship. Subtle changes are made in the analysis compared to DIP1000, to enable growth and shrink during the body analysis, with late catching of errors.

The following grammar changes are made and are non-optional. Optional grammar changes related to potential modifiers that could represent DIP1000 behaviour are done in the A Modifier Profile heading.

AtAttribute:
+    @ EscapeAttribute

ParameterAttributes:
+    @ EscapeAttribute

+ EscapeAttribute:
+    escape ( EscapeRelationships )
+    escape ( )
+    escape

+ EscapeRelationships:
+	EscapeRelationship
+	EscapeRelationship , EscapeRelationships

+ EscapeRelationship:
+   Identifier EscapeRelationshipModifiers|opt

+ EscapeRelationshipModifiers:
+    EscapeRelationshipModifier
+    EscapeRelationshipModifier EscapeRelationshipModifiers

+ EscapeRelationshipModifier:

Analysis

As an analysis, the escape set provides a framework for escape analysis and owner escape analysis to protect memory from leaving its known lifetime and potentially causing program corruption.

To do this it performs data flow analysis in a forward-only pass over a function body to detect the movement of memory into unmodellable locations and establish the relationships between variables for other analysis to work upon. It does not consider any escape set-related annotations when the analysis is started for the function signature. The annotated signature is only considered at the end of the analysis.

When the relationships have been determined at exit points, a process of convergence on the annotated parameters is performed. If a parameter has an has no user-provided annotation it is stored as inferration. If it was annotated by the user then it will be verified against the known relationship and if it does not match it is an error.

The rules on erroring for if a signature mismatched the analysis applies to @safe functions. It does not error for @trusted functions if it does not verify but they still infer if it is not fully annotated with the escape set. For @system functions they will not have this analysis applied to it.

If the annotated signature has a larger escape set or a stronger modifier for a relationship it is not an error. See Why Modifiers Are Useful heading for why this relaxation is very useful.

It is important that any analysis built upon this attempts to do error detection as late as possible.

int** global;

void escapeIt(@escape(__unknown) int** input) {
	global = input;
}

int* escapeOut(@escape(return) int* input) {
	{
		int** val = &input;
		// @escape(val) input
		
		escapeIt(val);
		// @escape(__unknown, val) input
	} // @escape(__unknown) input, Error: Variable `input` escapes into an unknown location
	// Do another pass and trace WHY `input` escaped into an unknown location!

	// @escape() input
	return input; // @escape(return) input
}

Tuple-Like

When a type acts in a tuple-like manner or can be modelled as such, each element may have its lifetime within a function. A function signature cannot model separate lifetimes between elements so it must be conflated to its containing variable.

int* func(@escape() int* input) {
	int*[3] tuple;
	tuple[0] = new int;
	tuple[1] = input;
	
	return tuple[0]; // ok
	return tuple[1]; // Error: Variable `input` cannot be escaped as its escape set does not include `return`
	return tuple[2]; // ok
}

An expression sequence functions as a tuple so does a static array. A struct can sometimes do this, however it is more involved than a simpler sequence representation. It may have mutable constructors, copy constructors, destructors, or postblit. Otherwise, all methods must be read-only. This is due to cross-function graph mutation is not modelled at the function signature level, but can be modelled within a function body.

Modifiers

Modifiers provide a way to describe to the compiler without a body, that the input-output relationship will have a specified set of characteristics. These characteristics typically come in the form of a strength, to denote what amount of protection is needed after the function call or the amount of protection that should not exist before it.

The goal of making the modifiers have a dedicated part of the syntax, is to eliminate dedicated keywords and new semantic behaviors wherein they look innocuous.

While no modifiers are described in this proposal as must be implemented, some potential ones are described to map into existing escape analysis designs.

Each modifier implemented will have an analysis that affects the relationships between variables within a function body. This in turn provides the verification of the function signature where it has been annotated, and inference where they are not.

A Modifier Profile

In light of DIP1000 and potential proposals, three core relationship modifiers can establish how an input goes into an output. None of these need to be supported unless there is analysis that can take advantage of them. They are provided to give concrete examples of what a modifier is meant to function as.

  • Take a pointer to an input (including to a field or method), into or in part of an output (&)
  • Copy the value of the input, directly into or as part of output (=).
  • Copy a value that came from the input, but not the input itself into or as part of the output (.)

The strength of each of these modifiers starts with the first &, and the last two have the same strength. Both = and . may be elided if & is provided on an input-to-output relationship.

The grammar changes:

EscapeRelationshipModifier:
+    &
+    =
+    .

If none of these modifiers is placed into an input-output relationship, then if both are by-ref it'll have & default relationship otherwise =.

It is recommended if these three modifiers are used, that when the default modifier = is applied, it should also apply .. This allows more compact forms that require less understanding to utilize.

The only guarantee provided by this proposal in association with these stated modifiers is that if implemented they will be accurately added to the signature during inference and validated as being accurate if manually annotated when a body is present.

ref int* inRefOutRef(/*@escape(return&)*/ ref int* input) => input;
int* inRefOut(/*@escape(return=)*/ ref int* input) => input;
int* inOut(/*@escape(return=)*/ int* input) => input;

Earlier it was stated that if a modifier is known to be the default it may be elided, all three of these examples would use the default modifier and therefore could have been elided if manually annotated.

An example of a relationship where the modifier & would be required and cannot be elided:

int** intRefOutPtr(/*@escape(return&)*/ ref int* input) => &input;

Given these examples, it can be assumed that the default relationships should be good enough for the majority of cases. It is only when you are doing something a bit more advanced that you need to opt into stronger guarantees.

Why Modifiers Are Useful

Being able to control the relationship outside of the default can be quite useful. For example with an owning object, we want to establish a strong relationship between the owner, and the borrow.

struct Owner {
	private {
		int* ptr;
	}

	int* borrow() @escape(return&) {
		return this.ptr; // strength of . which is less than &
	}
}

If we did not annotate the this pointer explicitly with the & modifier, it would have defaulted to =. which under normal situations is what would have been wanted with GC memory.

Breaking Changes and Deprecations

This proposal introduces only one attribute @escape. This may conflict with an existing user-defined attribute. If so it could be limited to a given edition and above or take preference over it.

No conflicts with DIP1000 are expected, these proposals can co-exist, although a lack of syntax reuse would be possible. Only one of these proposals should be active at one time.

Reference

Copyright & License

Copyright (c) 2024 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

History

The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment