Basic QTT

Quantitative type theory (QTT) is an extension of dependent type theory which allows us to track the number of computational (non-erased) uses of variables in well-typed lambda terms. With such usage information, QTT allows us to combine dependent types with substructural types, such as linear, affine or relevant types.

While in a Martin-Löf style type theory typing judgments are of form x1 : A1, ..., xn : An ⊢ b : B, in QTT the bindings in the context and the term are decorated with multiplicities, or quantities (denoted by lower case greek letters apart from ω):

Δ ⊢ ρ b : B

where Δ, the context, is of form π1 x1 : A1, ..., πn xn : An.

These quantities tell us the usage of the variables in context and gives us how many times we may use the term. For our purposes, we will have three possible quantities: 0 - no computational use, 1 - exactly one use, and ω - any usage more than 1. Terms marked by:

0 should be erased from runtime
1 stands for mutable data
ω stands for immutable and replicable data

Quantities can be naturally added and multiplied; all such possibilities are captured by:

ρ+0 = 0+ρ = ρ
1+1 = ω
ρ+ω = ω+ρ = ω

ρ*0 = 0*ρ = 0
ρ*1 = 1*ρ = ρ
ω*ω = ω

Addition and multiplication are commutative and associative. Furthermore, multiplication distributes over addition.

Now, since our hypotheses are annotated with a quantity, the dependent type must also have quantity information. It will have form (π x : A) → B and it is the type of functions that require π quantity of input to produce its output. Its introduction rule is

Δ, ρ*π x : A ⊢ ρ b : B
---------------------------(→I)
Δ ⊢ ρ λx. b : (π x : A) → B

which can be intuitively understood as follows: if we can build ρ quantities of b : B assuming ρ times π quantities of x : A, then we can build a ρ quantity of functions λx. b from A to B that requires a π quantity of input, given an overall context Δ. The elimination rule, on the other hand, is

Δ ⊢ ρ f : (π x : A) → B     Δ' ⊢ ρ*π a : A
------------------------------------------(→E)
Δ + Δ' ⊢ ρ (f a) : B[a/x]

That is, if from Δ we can build ρ functions f : (π x : A) → B and if from Δ' we can build ρ times π arguments a : A, then adding the resources from Δ and Δ' we can produce ρ outputs (f a) of type B[a/x]. The contexts here are assumed to be of the same form but having potentially different quantities. The addition of contexts is elementwise: π1 x1:A1, ..., πn xn:An plus φ1 x1:A1, ..., φn xn:An equals π1+φ1 x1:A1, ..., φn+πn xn:An.

Apart from typing rules, which should be fairly straightforward, the last rule we need is the identity rule. Here there are two possibilies: a linear QTT, where all quantities in the context must be used, and an affine QTT, where you can use up to what the context allows. The linear version is captured by:

--------------------------(L-id)
0Γ, ρ x : A, 0Γ' ⊢ ρ x : A

where 0Γ and 0Γ' denote contexts where all variables have quantity 0. The affine version is less restrictive:

(ρ' x : A) ∈ Δ    ρ ≤ ρ'
------------------------(A-id)
Δ ⊢ ρ x : A

where ≤ is the natural order 0 ≤ 1 ≤ ω. That is, as long as we have a sufficient (ρ' where ρ' ≥ ρ) quantity of x : A, we can build ρ x : A. Thus, in the affine case, we can think of quantity 1 as "up to 1 computational use" and ω as "any use whatsoever". From now on we will assume the less restrictive affine QTT.

The last thing to note is that QTT is built on top of a safe type theory and there exists a forgetful functor from QTT to its underlying theory. That is, for any valid derivation 𝒟 of Δ ⊢ ρ c : C it is always possible to get a derivation forget(𝒟) of the same form for forget(Δ) ⊢ c : forget(C) in the underlying type theory by simply removing all quantity annotations -- denoted by the forget functor. The inverse is not always the case: even if forget(Δ) ⊢ c : forget(C) typechecks, Δ ⊢ ρ c : C might not typecheck. The only rules that might not check are the instances of the identity rule; more specifically, the cases

----------------(A-id)
Δ ⊢ ρ x : A

where (ρ' x : A) ∈ Δ but ρ > ρ'. However, the underlying theory can be embedded in QTT: all derivations of Γ ⊢ c : C have corresponding derivation of ωΓ ⊢ ω c : ω(C), where ω(C) denotes the type we get by adding ω annotations to all of C's dependent types.

Although this means that the consistency of QTT is dependent on the consistency of its underlying theory, it does not mean QTT is automatically safe (as in having the subject reduction property), as we will see.

A major issue

Now, there is one unintuitive aspect of QTT as I presented here. If we take the introduction rule for the function type and substitute 1 for π and ω for ρ, since ω*1 = ω, we get the following instance:

Δ, ω x : A ⊢ ω b : B
---------------------------(→I)
Δ ⊢ ω λx. b : (1 x : A) → B

That is, to construct a replicable (ω) affine function λx. b : (1 x : A) → B, we may assume any number of x : A to build any number of b. This means, in particular, that x can appear multiple time in the body of b, which contradicts the supposed affinity of λx. b. Here is an example derivation which shows this unintuitive behavior:

                                         -------------------------------(A-id)   ----------------------------(A-id)
                                         ω f : A→A, 0 x : A ⊢ ω f : A→A          0 f : A→A, ω x : A ⊢ ω x : A
--------------------------------(A-id)   --------------------------------------------------------------------(→E)
ω f : A→A, 0 x : A ⊢ ω f : A→A         ω f : A→A, ω x : A ⊢ ω (f x) : A
-------------------------------------------------------------------------(→E)
ω f : A→A, ω x : A ⊢ ω (f (f x)) : A
-------------------------------------(→I)
ω f : A→A ⊢ ω λx.(f (f x)) : A→A
-----------------------------------------(→I)
⊢ ω λf.λx.(f (f x)) : (1 f : A→A) → (A→A)

Here A→A is short for (ω x : A) → A. Notice how the variable f is being used twice inside the body of the supposedly affine function of type (1 f : A→A) → (A→A). This means replicable functions lose all affinity/linearity properties.

While this might not be much of an issue for linear QTT, for affine QTT it is a fatal flaw. This is because in affine QTT we can derive, for any type C, a function that converts any ω value into a 1 value as seen by the derivation

-----------------(A-id)
ω x : C ⊢ 1 x : C
-------------------------(→I)
⊢ 1 λx. x : (ω x : C) → C

The conversion is a direct consequence of application

⊢ 1 λx. x : (ω x : C) → C    ⊢ ω c : C
--------------------------------------(→E)
⊢ 1 ((λx. x) c) : C

which, if we assume our theory is sound, enables us to reduce the redex and get ⊢ 1 c : C from a derivation of ⊢ ω c : C. Since we have already established that ω functions are not necessarily affine, we can conclude, after conversion, that 1 functions are also not necessarily affine (the same apply for 0 functions) and therefore the whole theory breaks down. The issue is an even bigger problem when we see that ⊢ 1 λf.λx.(f (f x)) : (1 f : A→A) → (A→A) does not typecheck, as opposed to the ω case. This implies the theory does not even follow the type preservation property!

How to solve the issue

Now let us dive a bit further into the problem. In affine QTT, whenever you have Δ' ≥ Δ (meaning the elements of Δ' are always greater than the elements of Δ), any derivation of Δ ⊢ ρ c : C will give rise to a derivation of Δ' ⊢ ρ c : C that has the exact same form -- the exact same rules, the exact same lambda term. This is the usual weakening (derived) rule: if we have resources to prove c : C from Δ, then we also have resources to prove c : C from the stronger context Δ'. In an analogous manner, if ρ' ≤ ρ and Δ ⊢ ρ c : C is provable, then we should also be able to prove Δ ⊢ ρ' c : C, as Δ should be enough resources to make ρ' copies of c : C. We will call this property the right (R-)weakening rule, as opposed to the left (L-)weakening rule.

In order for the R-weakening rule be derivable, it must be respected by all rules of our theory; which, ignoring the typing rules and the identity rule (which trivially satisfies it), there are only two. Let us check it, firstly, for the function elimination rule. Suppose we have a derivation 𝒟 of Δ ⊢ ρ c : C where the last rule is function elimination, that is, c = (f a) and C = B[a/x]. We thus have

  ℰ                            ℱ
Δ ⊢ ρ f : (π x : A) → B     Δ' ⊢ ρ*π a : A
------------------------------------------(→E)
Δ + Δ' ⊢ ρ (f a) : B[a/x]

ℰ and ℱ being the direct subderivations of 𝒟. By an inductive argument, we can find derivations ℰ' and ℱ' of Δ ⊢ ρ' f : (π x : A) → B and Δ' ⊢ ρ'*π a : A (since ρ'π ≤ ρπ), that have the same form of ℰ and ℱ, respectively, . Finally, we get a derivation 𝒟' of the same form of 𝒟:

  ℰ'                            ℱ'
Δ ⊢ ρ' f : (π x : A) → B     Δ' ⊢ ρ'*π a : A
------------------------------------------(→E)
Δ + Δ' ⊢ ρ' (f a) : B[a/x]

The function elimination rule thus respect R-weakening.

Function introduction rule, however, does not respect R-weakening. See, if we substitute ρ' for ρ, for ρ'≤ρ, we get

Δ, ρ'*π x : A ⊢ ρ' b : B
---------------------------(→I)
Δ ⊢ ρ' λx. b : (π x : A) → B

and although the succedent of the upper sequent (the premise of the rule) is weakened by ρ', the antecedent is also weakened by ρ'*π, potentially making the upper sequent stronger (harder to prove). This is exactly why the unintuitive instance of the rule (with ω for ρ and 1 for π) was incorrect: if we R-weaken it by substituting ω by 1 we actually get a stronger premise to prove; i.e., one that requires at most one use of x : A in the body b : B. It seems like our intuition was right all along!

We need a new rule which will respect R-weakening; and thus, a rule that will enforce affinity no matter the replicability status of the term to be built. Intuitively, this rule should be something like: if we can build 1 b : B from Δ and π x : A, then we can build ρ λx. b : (π x : A) → B from ρ*Δ, or:

Δ, π x : A ⊢ 1 b : B
---------------------------(→I)
ρ*Δ ⊢ ρ λx. b : (π x : A) → B

This is quite similar to our previous rule: if we "multiply" the upper sequent by ρ (by "repeating" the derivation ρ times), we get the sequent ρ*Δ, ρ*π x : A ⊢ ρ b : B which is exactly the upper sequent we would get if we have used the old rule.

The issue with this rule is that it is too restrictive: it only allows contexts of form ρ*Δ in the bottom sequent. If instead we could divide the context into ρ parts, we could have the rule:

Δ/ρ, π x : A ⊢ 1 b : B
---------------------------(→I)
Δ ⊢ ρ λx. b : (π x : A) → B

Of course, it is not always possible to define a single divisor (or even one divisor at all!): for instance, there are two solutions to x*ω = ω and no solutions to x*ω = 1. We will define, however, π/ρ, the division of quantities, as the largest quantity δ such that δ*ρ is not larger than π. The idea behind this is that we want Δ/ρ to be the strongest set of hypotheses possible, yet not too strong that it would be stronger than Δ when multiplied by ρ. Assuming the strongest hypotheses is never an issue since our theory is affine.

As an example, we will take the problematic instance of the introduction rule -- that is, ρ = ω and π = 1:

Δ/ω, 1 x : A ⊢ 1 b : B
---------------------------(→I)
Δ ⊢ ω λx. b : (1 x : A) → B

The action of dividing Δ by ω is the same as erasing all linear variables in Δ. It makes sense: if 1 b : B used 1 y : C in its body, then ω b : B would need ω y : C, not just 1. Furthermore, affinity of the lambda variable x is preserved.

More importantly, this new rule preserves R-weakening: this is easily seen by the fact that if ρ' ≤ ρ then Δ/ρ' ≥ Δ/ρ.

Safety of QTT with the new rule

In order for the new introduction rule to be safe, it must have the property of subject reduction (or type preservation). To this end, it is enough to show the admissibility of substitution; that is, of the rule

Δ, π x : A, Ε ⊢ ρ b : B    Δ' ⊢ π a : A
---------------------------------------(cut)
Δ+Δ', Ε[a/x] ⊢ ρ b[a/x] : B[a/x]

usually called the cut rule. Although I am not going to show here a full proof of the admissibility of substitution, I will give a sketch of a proof, taking for granted the following lemmas:

Lemma 1: `Δ ⊢ π a : A` implies `Δρ ⊢ πρ a : A`

Lemma 2: `Δ ⊢ π a : A` implies `Δ/ρ ⊢ π/ρ a : A`

The proof will proceed in an inductive manner. Ignoring typing rules, the bases are the identity rules, which trivially satisfy admissibility of substitution. The other two rules goes as follows:

Function introduction:

We have

                           𝒟
Δ/ρ, π/ρ x : A, Ε/ρ, y : B ⊢ 1 c : C
---------------------------------------(→I)
Δ, π x : A, Ε ⊢ ρ λy. c : (π y : B) → C

and Δ' ⊢ π a : A. Using Lemma 2 we get Δ'/ρ ⊢ π/ρ a : A. We may now inductively use the cut rule:

Δ/ρ, π/ρ x : A, Ε/ρ, y : B ⊢ 1 c : C    `Δ'/ρ ⊢ π/ρ a : A`
----------------------------------------------------------(cut)
Δ/ρ+Δ'/ρ, Ε[a/x]/ρ ⊢ 1 b[a/x] : B[a/x]

Since Δ/ρ+Δ'/ρ ≤ (Δ+Δ')/ρ, we can use L-weakening and get a derivation for (Δ+Δ')/ρ, Ε[a/x]/ρ ⊢ 1 b[a/x] : B[a/x], which will then give us

(Δ+Δ')/ρ, Ε[a/x]/ρ ⊢ 1 b[a/x] : B[a/x]
----------------------------------------------------(→I)
Δ+Δ', Ε[a/x] ⊢ ρ (λy. c)[a/x] : ((π y : B) → C)[a/x]

as we wanted to prove.

Function elimination:

We have

Δ1, π1 x : A, Ε1 ⊢ ρ g : (φ y : B) → C     Δ2, π2 x : A, Ε2 ⊢ ρ*φ b : B
-----------------------------------------------------------------------(→E)
Δ1+Δ2, π1+π2 x : A, Ε1+E2 ⊢ ρ (g b) : C[b/y]

and Δ' ⊢ π a : A. We will first show that it is possible to find Δ1' and Δ2' such that Δ' = Δ1' + Δ2', Δ1' ⊢ π1 a : A and Δ2' ⊢ π2 a : A, considering all the possible cases for π1 and π2:

If π1 = 0, then we choose Δ1' = 0*Δ' and Δ2' = Δ'. It is clear that Δ' = Δ1' + Δ2'. We derive 0*Δ' ⊢ 0 a : A, or Δ1' ⊢ π1 a : A, by using lemma 1, and Δ' ⊢ π2 a : A by R-weakening. The case π2 = 0 is entirely analogous.
If neither π1 nor π2 are equal to 0, then π1 + π2 = ω. We choose Δ1' = Δ'/ω and Δ2' = Δ' (the reader may want to check that indeed Δ'/ω + Δ' = Δ'). Using R-weakening on Δ' ⊢ π1+π2 a : A we get Δ' ⊢ π2 a : A. Lemma 2 used on Δ' ⊢ π1+π2 a : A, or Δ' ⊢ ω a : A, will get us Δ'/ω ⊢ ω a : A. R-weakening then give us Δ'/ω ⊢ π1 a : A.

Having such contexts Δ1' and Δ2' we may now inductively use the cut rule in each premise

Δ1, π1 x : A, Ε1 ⊢ ρ g : (φ y : B) → C     Δ1' ⊢ π1 a : A
---------------------------------------------------------(cut)
Δ1+Δ1', Ε1[a/x] ⊢ ρ g[a/x] : ((φ y : B) → C)[a/x]

and

Δ2, π2 x : A, Ε2 ⊢ ρ*φ b : B     Δ2' ⊢ π2 a : A
-----------------------------------------------(cut)
Δ2+Δ2', Ε2[a/x] ⊢ ρ*φ b[a/x] : B[a/x]

and thus, by use of the elimination rule:

Δ1+Δ1', Ε1[a/x] ⊢ ρ g[a/x] : ((φ y : B) → C)[a/x]     Δ2+Δ2', Ε2[a/x] ⊢ ρ*φ b[a/x] : B[a/x]
-------------------------------------------------------------------------------------------(→E)
(Δ1+Δ2)+Δ', (Ε1+E2)[a/x] ⊢ ρ (g b)[a/x] : C[b/y][a/x]

This completes the proof.

gabriel-barrett/qtt.md