Understanding the Phases Applicative

While I was researching how to do level-order traversals of a binary tree in Haskell, I came across a library called tree-traversals which introduced a fancy Applicative instance called Phases. It took me a lot of effort to understand how it works. Although I still have some unresolved issues, I want to share my journey.

Note: I was planning to post this article on Reddit. But I gave up because it was too long so here might be a better place.

See the discussion.

Note: This article is written in a beginner-friendly way. Experts may find it tedious.

Note: The author is not a native English speaker and is glad to accept corrections, refinements and suggestions.

Demo

The Phases Applicative is capable of reordering effects in the chain of the underlying Applicative, without changing the function application semantics.

To demonstrate this, let's import it first.

ghci> import Control.Applicative.Phases

Then we define two effectful computations.

ghci> let foo = putStrLn "foo" *> pure (3 :: Int)
ghci> let bar = putStrLn "bar" *> pure (4 :: Int)

Tip: If you have trouble reading these *> expressions, you can interpret them monadically. Which is to say
let foo = putStrLn "foo" *> pure (3 :: Int)
is equivalent to
let foo = putStrLn "foo" >> return (3 :: Int)
or even
let foo = do { putStrLn "foo"; return (3 :: Int) }
(This is only a trick for readability and is definitely not to say every Applicative is a Monad, although in the case of IO it's true.)

Let's chain these computations.

ghci> pure (,) <*> foo <*> bar
foo
bar
(3,4)

(One could use <$> or even liftA2 for conciseness. But I prefer pure and <*> because they feel more "primitive".)

To adapt the Phases Applicative, add now

ghci> runPhasesForwards (pure (,) <*> now foo <*> now bar)
foo
bar
(3,4)

Which just yields the same result. Boring.

Use delay to postpone one effect.

ghci> runPhasesForwards (pure (,) <*> delay (now foo) <*> now bar)
bar
foo
(3,4)

Use delay twice to postpone one effect ... by two turns.

ghci> let baz = putStrLn "baz" *> pure (5 :: Int)
ghci> runPhasesForwards (pure (,,) <*> delay (delay (now foo)) <*> delay (now bar) <*> now baz)
baz
bar
foo
(3,4,5)

Note although the order of the effects changes, the associated function application does not. Which can be seen in the above computation yielding the value (3,4,5), which corresponds to the values "extracted" in the order of foo,bar,baz.

It works like magic.

It's (Not) a Free Applicative

The definition of Phases is

data Phases f a where
  Lift :: f a -> Phases f a
  (:<*>) :: f (a -> b) -> Phases f a -> Phases f b

Tip: if you are unfamiliar with the above syntax, learn about the GADTs extension first.

A side effect of GADTs is that it enables the use of existential quantification in data constructors. The definition of (:<*>) exploits this. It "hides" the type a inside the constructor.

If we use the ExistentialQuantification extension instead, the above definition would be
data Phases f a = Lift (f a)
                | forall x. f (x -> a) :<*> Phases f x
In this definition, the constructor (:<*>) "hides" the type x. (Note x does not appear to the left hand side of =, thus is not a part of the type signature.)

This definition resembles that of a Free Applicative a lot because the (:<*>) constructor captures the <*> operation instead of actually performing it. So I read the Free Applicative paper.

Tip: A free something, explained in a non Category Theory way, is a wrapper data type that satisfies something but does not require the underlying type to satisfy it. Instead of performing the actual computation, it captures the arguments into corresponding data constructors, accumulating a chain-like structure, which allows the computation to be interpreted in different ways later. If you are interested, try Free Monads first.

To my surprise, there are two kinds of Free Applicatives: the left-parenthesised and the right-parenthesised.

The left-parenthesised version has the following definition:

data FreeAL f a = PureL a
                | forall b. FreeAL f (b -> a) :*: f b

Which corresponds to the canonical form for chaining Applicatives:

pure f <*> x1 <*> x2 <*> ... <*> xn

The right-parenthesised version has the following definition:

data FreeA f a = Pure a
               | forall b. f (b -> a) :$: FreeA f b

Which corresponds to right-parenthesised canonical form:

fn <*> (... (f2 <*> (f1 <*> (pure x))))

This form is rarely seen in the real world.

The paper claimed these two definitions were isomorphic and it took me some time to figure out why.

To turn a left-parenthesised form into a right-parenthesised one:

(pure f <*> x) <*> y
=
fmap (&) x <*> (fmap (&) y <*> (pure (\fy -> \fx -> f fx fy)))

where (&) is the reverse function application operator. (Provided in Data.Function)

i.e.

(&) x f = f x

To turn a right-parenthesised form into a left-parenthesised one:

u <*> (v <*> pure x)
=
pure (\fu -> \fv -> fu (fv x)) <*> u <*> v

This isomorphism / duality should not be very surprising for two reasons.

It's analogous to how to implement one of foldl / foldr via the other.
It can be seen as a generalization of the interchange law of Applicatives. (u <*> pure y = pure ($ y) <*> u)

The definition of Phases is almost a definition of the right-parenthesised Free Applicative.

data Phases f a = Lift (f a)
                | forall x. f (x -> a) :<*> Phases f x

-- vs --

data FreeA f a = Pure a
               | forall b. f (b -> a) :$: FreeA f b

But it isn't. Because the Lift constructor can capture any effectful applicative computation while the Pure constructor can only capture a plain value.

The Order of Effects

I have been playing with Haskell for several years. But until this time I learn that Applicative does not restrict the order of its effects.

Take IO as an example.

The following expression

pure (,) <*> getChar <*> getChar

when executed, reads two characters from stdin and wraps them in a pair.

Suppose the user inputs ab.

You might get ('a', 'b').

You might also get ('b', 'a') ! This will happen if the second getChar is executed first.

This indetermination will drive you crazy.

So most Applicatives including IO are implemented in a way that you can assume a left-to-right order on effects. This guarantees the latter behavior will never occur.

Not specifying of the order of effects is a feature, not a bug™. Some libraries exploit it. Backwards from the transformers package runs effects in reverse order. Concurrently from the async package runs effects in parallel.

It's a pity that most Applicative tutorials don't point out this issue. One exception is the tutorial from wikibooks, which includes a dedicated section.

Other references include this Reddit post and this Stack Overflow question.

Guess what this expression will output when executed?

add <*> (multiply <*> pure 10)

where

add = putStrLn "add" *> pure (+ (1 :: Int))
multiply = putStrLn "multiply" *> pure (* (2 :: Int))

The answer is

add
multiply

Which is counterintuitive. At the same time, an astute reader will realize the effects in the chain of Applicatives is assocative. This is to say if one can combine effects with <>, the final effect of (effect1 <> effect2) <> effect3 is the same as that of effect1 <> (effect2 <> effect3). Plus there is pure, a way to wrap a value into an Applicative with no effect (I guess that's why it's called pure), which corresponds to mempty. So we can conclude that it forms a special instance of Monoid!

Being a Monoid means, for any "order-sensitive" Applicative like IO, the order of the associated effects depends only on the order in which the computation appears in the expression, not on the order in which the computation is performed. If we want to shuffle the effects, we need to move the computations around without changing the semantics of the overall computation. And the "overall computation" in the chain of Applicatives is ... a function application.

Shuffle Function Applications

The function application operator is the most notorious operator in the world.

It's not associative: f a b ≠ f (a b)
It's not commutative: f x ≠ x f
It doesn't even have a visible notation! Suppose it has one, !, then a Haskell program zipWith (,) [1..10] ['a'..'z'] would have to be written as zipWith ! (,) ! [1..10] ! ['a'..'z']. (We don't treat ($) as a valid candidate because a function application operator should be left-associative.)

But we can fix that.

We know that f x y ≠ f (x y). But we can combine x and y first then apply the result to f by f x y = (uncurry f) ((,) x y).
We know that f (g x) ≠ (f g) x. But we can compose f and g first then apply x to the composition by f (g x) = ((.) f g) x.
We know that f x ≠ x f. But we can swap them by f x = ($ x) f.
We know that f x y ≠ f y x. But we can swap x and y by f x y = (flip f) y x.
We know that (f x) (g y) ≠ (f g) (x y). But we can use (f x) (g y) = (\(x', y') -> f x' (g y')) ((,) x y). This will change the order of occurrence of terms from f,x,g,y to f,g,x,y.

These techniques are useless, until you find they can be applied in the Applicative world.

By the way, the function application operator in the Applicative world has a visible notation, that is, <*>.

Now if you stare at the definition of the Applicative instance for Phases, it should look less like a bunch of abstract nonsense to you.

instance Applicative f => Applicative (Phases f) where
  pure = now . pure
  Lift mf <*> Lift ma = Lift $ mf <*> ma 
  Lift mf <*> (mh :<*> ty) = liftA2 (.) mf mh :<*> ty
  (mg :<*> tx) <*> Lift ma = liftA2 flip mg ma :<*> tx
  (mg :<*> tx) <*> (mh :<*> ty) = liftA2 (\g h ~(x,y) -> g x (h y)) mg mh :<*> liftA2 (,) tx ty

It "lifts" the above shuffling techniques to the Applicative level!

Zip Effects

We have divided our analysis of Applicatives into two parts - the function application part and the effect part. The former is done. Here comes the latter.

I struggled for a long time trying to understand what happens to the effects when two Phases are chained with a <*>. Then I came up with the following data type.

data EffectList a = Singleton a
                  | Cons a (EffectList a)
                  deriving Show

Here I model effects as a. And a should be a Monoid.

For example:

String "1" represents "some effect that outputs 1 to stdout when executed".
String "12" represents "some (composed) effect that outputs 1 then outputs 2 to stdout when executed".
String "" represents "no effect".

This data type definition is very similar to that of Phases,

data Phases f a = Lift (f a)
                | forall x. f (x -> a) :<*> Phases f x

except it replaces the f (x -> a) part.

Then I invent an operator <+> (I just make up the notation), defined as

infixl 4 <+>

(<+>) :: Monoid a => EffectList a -> EffectList a -> EffectList a
(Singleton xs) <+> (Singleton ys) = Singleton (xs <> ys)
(Singleton xs) <+> (Cons ys m) = Cons (xs <> ys) m
(Cons xs m) <+> (Singleton ys) = Cons (xs <> ys) m
(Cons xs m) <+> (Cons ys n) = Cons (xs <> ys) (m <+> n)

I define it by referencing the definition of <*> for Phases.

For example, one case of <*> is

(mg :<*> tx) <*> Lift ma = liftA2 flip mg ma :<*> tx

If we focus on the effect part and blur the others, we will get

(mg :<*> tx) <*> (Lift ma) = (mg ... ma) :<*> tx

Which corresponds to

(Cons xs m) <+> (Singleton ys) = Cons (xs <> ys) m

Using the same method, we can derive the other cases.

Before experimenting with this operator, let's introduce two functions.

now :: Monoid a => a -> EffectList a
now = Singleton

delay :: Monoid a => EffectList a -> EffectList a
delay = Cons mempty

Which are defined in the same way as those with the same name in Phases. I list them below for your information.

-- | schedule an action to run in the current phase
now :: f a -> Phases f a
now = Lift

-- | delay all actions by a phase
delay :: Applicative f => Phases f a -> Phases f a
delay ta = pure id :<*> ta

Let's try it.

ghci> now "1" <+> now "2" <+> now "3" <+> now "4"
Singleton "1234"

ghci> now "1" <+> now "2" <+> delay (now "3") <+> now "4"
Cons "124" (Singleton "3")

ghci> delay (now "1") <+> now "2" <+> delay (delay (now "3")) <+> now "4"
Cons "24" (Cons "1" (Singleton "3"))

We can interpret the results in this way: "effects" with the same "nested level" are concatenated together.

For instance, in the third example:

We get "24" first because now "2" and now "4" are nested under zero delays.
We get "1" next because now "1" is nested under one delay.
We get "3" last because now "3" is nested under two delays.

To further explain what the <+> operator does, let's list the (desugared) operands and the result in the third example in parallel.

Cons      ""   (Singleton "1")
Singleton "2"
Cons      ""   (Cons      ""  (Singleton "3"))
Singleton "4"
----------------------------------------------
Cons      "24" (Cons      "1" (Singleton "3"))

Can you recognize this computation?

It's zipWith (<>)! (To be more precise, it's zipWith4 (<>).)

But not exactly. Here are the differences:

zipWith accepts normal lists while <+> accepts EffectLists, which is defined as non-empty lists.
zipWith handles ragged lists by dropping the extra parts, while <+> keeps them.
To use zipWith, you use it like liftA2. To use <+>, you use it like <*> (i.e. interleave the operands with it)

For the sake of completeness, here is the definition of run, the counterpart to runPhasesForward. It fuses the effects stored in a EffectList.

run :: Monoid a => EffectList a -> a
run (Singleton xs) = xs
run (Cons xs m) = xs <> run m

You can use it like this.

ghci> run $ Cons "24" (Cons "1" (Singleton "3"))
"2413"

With the aid of EffectList, it would be natural to understand how Phases works. Effects in the same phase will be concatenated in the same cell while effects in the later phases will be concatenated in the next ones.

Bonus

Evgeniy Malov provides an implementation for level order traversals of a binary tree. It contains a zipWith' function, which keeps the extra parts of ragged lists too! Seems the Phases Applicative really captures some essentials.
The Applicative paper also mentions zipWith as an instance of Applicative. zapp is similar to <+>.

Unresolved Issues

I still consider my method (divide an Applicative into the function application part and the effect part and analyze them one by one) a little complicated. If you have a more concise or a more clever explanation, please let me know.
Does EffectList have a canonical name? Is there an isomorphic structure mentioned in a paper?

kbridge/phases.md