This is my first Haskell post on this site, hopefully the first of many!

I will explain how I came to use rankNTypes, why they are so useful. Hopefully, when reading this post, you will recognize when you might want to use them in your own situation.

On a side note: if you are just looking for a catch-all data type for Directed Acyclic Graphs, there is a really good blog post about DAGs in Haskell by Twan van Laarhoven. I will just focus on the RankNTypes part. Since the rank2Types switch is currently just a synonym for the rankNTypes switch, I will just talk about the latter.

In 'regular' Haskell, there is just one forall in a type, and it is what a type starts with. It may be implicit or explicit, either way forall a. a -> a is always interpreted as forall a. (a -> a). RankNTypes allow you to write the brackets differently, that is: forall may occur to the left hand side of a type:

  (forall name. WithDAG name a) -> String

As an example for its use, I present Directed Acyclic Graphs (or DAGs), which I used in my research on Networks on Chips.

{-# OPTIONS_GHC -Wall #-} -- show almost all warnings
{-# LANGUAGE RankNTypes, GeneralizedNewtypeDeriving, RoleAnnotations #-}
module MyDAGS (main, WithDAGT, runDAG, freezeRefN, addNode, showDAG, RefN, Node, DAG) where
import Control.Monad.State

What is a DAG?

A DAG is a set of nodes together with a function from each node to a list (or set, depending on who defines "DAG") of nodes, in such a way that the function forms no cycles.

The "no cycle" requirement means that we will get an invariant which we need to keep consistent. This involves keeping track of which DAG each node belongs to. To do this, we add a type variable to each DAG, node and reference.

newtype DAG name = DAG [Node name] deriving Show
data Node name = Node String [RefN name] deriving Show
newtype RefN name = RefN Int deriving Show
-- add a node to the DAG, return a reference to the new node (and the new DAG)
hiddenAddNode :: String -> [RefN n] -> DAG n -> (RefN n, DAG n)
hiddenAddNode label refs (DAG nodes)
 = ( RefN (length nodes)
   , DAG (nodes ++ [Node label refs])
   )

Note that the name part is not used anywhere, this is called a Phantom Type, and it will help us keep track of nodes and which references belong to which DAGs. If we refrain from using cyclic definitions, then "new" nodes can just be built from "old" ones. Since the old ones point to existing nodes, the new nodes will point backwards in the latest DAG:

 (dagnode1,dag1) = hiddenAddNode "node1" []         (DAG [])
 (dagnode2,dag2) = hiddenAddNode "node2" [dagnode1] dag1
 -- dag2 == DAG [Node "node1" [],Node "node2" [RefN 0]]

Unfortunately, Haskell will just match "name" to any type we throw at it, so we could also write:

 (dagnode1,dag1) = hiddenAddNode "node1" []         (DAG [])
 (dagnode2,dag2) = hiddenAddNode "node2" [dagnode1] (DAG [])
 -- dag2 == DAG [Node "node2" [RefN 0]]
 -- ouch! Node 0 refers to Node 0, a cycle!

When to use rankNTypes

So here is the problem we wish to solve: "node2" points to itself, and we only used hiddenAddNode and DAG []. The problem is that the second (DAG []) refers to an old version of the DAG, while dagnode1 is a reference to a newer version. As a solution, we pass the DAG implicitly, such that we are forced to use the newest version all the time. Monads are great for this; we should be able to write:

someDAG :: String
someDAG = showDAG (do dagnode1  <- addNode "node1" []
                      _dagnode2 <- addNode "node2" [dagnode1]
                      return ())

The type of DAG n -> (DAG n, RefN n) is precisely the right type for the state monad. The state monad also allows us to get to the internal state via get and put, which is what we want to prevent. Therefore we create our own monad instead, and use it for addNode:

newtype WithDAGT name a = WithDAG (State (DAG name) a)
   deriving (Monad,Applicative,Functor)
addNode :: String -> [RefN name] -> WithDAGT name (RefN name)
addNode label refs = WithDAG (state (hiddenAddNode label refs))

We have to supply an initial empty DAG somewhere, somehow. This is where rankNTypes come in. I learned the trick from this paper by Simon Peyton Jones.

runDAG :: (forall name. WithDAGT name a) -> (a, DAG ())
runDAG (WithDAG st) = runState st (DAG [])

The cool thing about runDAG is that it allows us to get all the values. The 'name' of the corresponding DAG, however, is marked as (). This prevents us from ever using that DAG inside runDAG ever again. This is because name does not match with (), which is exactly what that forall was for. In fact, it even prevents us from getting references out of the WithDAG:

  someNode = fst (runDAG (addNode "node1" []))

gives the following type error:

  Couldn't match type 'a' with 'RefN name'
    because type variable 'name' would escape its scope

There is an easy fix:

freezeRefN :: RefN name -> WithDAGT name (RefN ())
freezeRefN (RefN i) = return (RefN i)
_someNode :: RefN () -- returns "RefN 0"
_someNode = fst (runDAG (do node<- addNode "node1" []
                            freezeRefN node))

Once again, we get a reference which we will never be able to use inside runDAG. This means that runDAG, freezeRefN, and addNode are perfectly safe to export. We are able to write someDAG with them, since we can implement showDAG with just runDAG:

showDAG :: (forall name. WithDAGT name a) -> String
showDAG wd = show (snd (runDAG wd))

main :: IO ()
main = putStrLn someDAG
-- output: DAG [Node "node1" [],Node "node2" [RefN 0]]

You'll see that we also exported RefN, Node and DAG. These are just the types, and that is fine too, as long as we do not export their constructors: we want all construction inside WithDAGT. If we do export the constructors, we are in trouble, since it would then be trivial to cast a RefN name to a RefN name2. In fact, there is this new feature (as of GHC 7.8) called coerce that can do just that. This is where I believe things get ugly, but there is a fix. I won't get into the details, but the following lines will make sure that coerce is not allowed.

type role RefN nominal
type role WithDAGT nominal nominal

Another downside to our solution, is that Haskell is not able to derive all the types automatically anymore. Instead, we get type errors every time we forget to put a forall to the left hand side of an arrow. This can be a bit annoying, and sometimes even very annoying, but I'll save that for another blog post. In GHC 7.10.1, RankNTypes do "just work", but they produce awkward error messages and sometimes require unintuitive work-arounds. This makes rankNTypes feel kind of un-Haskell like to me. People are currently improving the warnings for GADTs, which will influence rankNTypes as well: rankNTypes support will probably only get better over time.

Exercises

If you wish to check whether you've understood this post, here are some exercises for you:

  1. Run main (either by renaming the module to Main, or by using ghci).
  2. Create a DAG from a tree structure using only addNode and ultimately runDAG (you can use all the Prelude functions, and need to define a tree structure yourself.)
  3. This one is quite tricky: there are several ways to solve this, and I don't believe any of those solutions make the code prettier (mail me if they do).
    The type Node name is not present in any of the 'safe' functions (runDAG, freezeRefN, and addNode). Hence it should be possible to create another safe version with the following definition instead:
   data Node = Node String [Int]
  1. Write getNode, which looks up a Node in the DAG according to its reference, and then freezeNode, such that this works:
    someNode = fst (runDAG (do nodeRef <- addNode "node1" []
                               realNode <- getNode nodeRef
                               freezeNode realNode))