Text.ParserCombinators.ReadP

ReadP

The ReadP module is part of Haskell’s base package and it is a really interesting piece of work. It covers so many aspects of Haskell: algebraic data types, laziness, monads and continuation passing style.

Even though the module is quite short it provides all combinators to write arbitrarily complex parsers. One of the uses of this module is the implementation of read - reading a value out of a string.

However it is not so trivial to understand the implementation (at least not for me) and that is the purpose of this blog post. This blog post will dig into how this library works and serves as a reference to myself to look back to in the future.

The types

The ReadP module has 3 data types

A variable of type ReadS, the S stands for String I suppose, is basically a function that takes a string and returns you all possible interpretations of it (interpretation means parser result of course). The definition of ReadS is the typical type signature for a parser in Haskell.

Next up is P. Nowhere in the documentation of the source it is mentioned what the P stands for, but I guess it stands for primitive, because all the constructors of P represent a primitive operation of a parser:

  • Get the next character of the input
  • Look at the remaining input, but non-destructivly
  • The parser must be able to fail
  • The parser must be able to reach a result
  • And finally a parser must reach a final state in order to be successful

Get, Look and Result are interesting, because they hold the information how to continue. That is a very important first insight. This idea is the heart of every parser: a parser is nothing more than a bunch of primitive (step-wise) operations and rules that say how to continue based on the result of the current step. For example, as we will see more below, Get is responsible to consume one char from the remaining input and also holds the function that decides what will be the next primitive operation based on the current result. Likewise for Look. Result can be interpreted like a milestone. We succeeded to parse part of the input and from there we try to parse the remainder.

The last type is ReadP. The definition of ReadP is identical to the definition of ContT and that is no coincidence. ContT stands for the continuation monad transformer. A continuation is a computation that explicitely receives the piece of code that will take the computation result and executes based on its result.

ReadP is exactly that: A fully assembled pipeline of primitive parser actions that just needs the continuation to pass its result to. The net type of ReadP k, where k is a continuation, is P a, i. e. a parser primitive. All of this will hopefully become a bit clearer in a moment.

There exist 4 insights to enlightment.

Insight 1: The Monad definitions

P a and ReadP a are both instances of Monad

We can try to express the meaning of these definitions. The bind operation for Get f can be written as Get f >>= k = Get f'. Since Get holds the continuation to pass the next char to, this means we are changing the continuation, i.e. the piece of code that consumes the next character and then moves on. The behaviour of the new continuation is pretty straight forward: Take the char, produce a new parser primitive (e.g. Fail) and bind (>>=) it to k. Look works analogous. If we reach a Fail that means the parser did not succeed, thus there is no need to continue parsing. Fail acts as a shortcut identical to Nothing in the Maybe monad. If we reach a Result we fork the parser in order to follow both paths: the one continuing with p followed by k and the one just with k.

The monad definition of ReadP is identical to the one for ContT, for reasons we mentioned above.

But what does the above definition actually mean?!

What makes the whole continuation passing style implementation so hard to grasp is its upside-down nature compared to usual sequential code. But there is one important thing to remember: The continuation will receive the final result. That means it must come last. That is the reason why k appears in the inner most paranthesis level and has to be tunneled through. The implementation of (m >>= f) k explained in words is: “perform the action m, take its result, pass the result to f to calculate the new action to perform, perform the new action and finally pass its value to the continuation k”. m >>= f thus a new action on its own, which needs an explicit continuation parameter.

ReadP can be interpreted as a ‘build rule’ for a parser, made up only of primitive parser actions. The Monad implementation for ReadP states how two build rules can be made into one, bigger build rule. This behaviour is the key to build complicated parsers and many higher level functions such as many, choice, etc.

Insight 2: The run function

So we know how to combine ReadP values. But what we really want is a parser, i.e. a function that accepts a string as input and returns us all possible parses of that string, what we want is ReadS.

Conveniently the ReadP library exports a function that does exactly that, and its implementation is stunningly elegant

All that readP_to_S does is provide the final continuation to the ReadP primitive parser, which will result in a value of type P a, and then hand this value over to the run function.

The run function knows how to handle the primitive parser instructions. Its implementation is again stunningly concise

Let’s really quickly explain what each of the above pattern matches does

  1. If run finds a Get, it pops one char off the remaining string and hands it over to the continuation held by the Get
  2. If run sees a Look, it hands over the whole remaining input string to the continuation held by Look, but does not alter it
  3. If run finds a Result, it prepends the result and the remaining input to the list of all found matches and then continues with the parser p held by the Result value
  4. A Final value stops further evaluation and just returns the value wrapped by Final
  5. Everything else, i.e. Fail, yields to an abort. An empty list is returned, which means no result, or no success respectively.

So to sum it all up: The ReadP Monad constructs the parser, while the run function is the glue code that executes them.

Insight 3: Laziness

If you try to think the above implementations through in your head, it very quickly starts to become confusing. It is not recommended to deconstruct the whole bind (>>=) chain of a ReadP do block. Same holds true for how run deals with its input value. In fact the compiler does it neither. One of Haskell’s need features is laziness: values are only evaluated when they are needed. Haskell builds and records the build rule (thunk) for a value, but postpones the evaluation until is is really needed.

So in the above constructions Haskell just recors what it has to do with the values but doesn’t actually do anything with them until their results are required.

Insight 4: Showcase

Ok talk is cheap, let’s get our hands dirty! The concepts make much more sense once you see them on actual code.

I will exercise a full parser run. Here is our parser

And now let’s ‘run’ it with the input “axczzz”

readP_to_S my_parser "axczzz" =
-- definition of my_parser + do de-sugaring
readP_to_S (get >>= \c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) "axczzz" =
-- definition of get
readP_to_S ((R Get) >>= \c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) "axczzz" =
-- definition of (>>=) for ReadP
readP_to_S (R (\k -> Get (\a -> let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) a
                                in m' k
                         )
              )
           ) "axczzz" =
-- definition of readP_to_S
run ((\k -> Get (\a -> let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) a
                       in m' k
                )
     ) return) "axczzz" =
-- definition of return (= pure) for P
run ((\k -> Get (\a -> let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) a
                       in m' k
                )
     ) (\x -> Result x Fail)) "axczzz" =
-- apply lambda function
run (Get (\a -> let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) a
                in m' (\x -> Result x Fail)
         )
    ) "axczzz" =
-- apply the definition of run for the pattern match Get f
run ((\a -> let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) a
            in m' (\x -> Result x Fail)
     ) 'a'
    ) "xczzz" =
-- apply lambda function
run (let R m' = (\c1 -> do {char 'x'; c2 <- get; return [c1, c2]}) 'a'
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- apply lambda function
run (let R m' = do {char 'x'; c2 <- get; return ['a', c2]}
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- do de-sugaring
run (let R m' = char 'x' >>= (\_ -> do {c2 <- get; return ['a', c2]})
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of char x
run (let R m' = (satisfy ('x' ==)) >>= (\_ -> do {c2 <- get; return ['a', c2]})
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of satisfy p
run (let R m' = do {c <- get; if ('x' ==) c then return c else pfail} >>= \_ ->
                do {c2 <- get; return ['a', c2]})
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of (>>=) for ReadP
run (let R m' = (get >>= (\u -> do {if ('x' ==) u then return u else pfail})) >>= \_ ->
                 do {c2 <- get; return ['a', c2]})
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of get
run (let R m' = ((R Get) >>= (\u -> do {if ('x' ==) u then return u else pfail})) >>= \_ ->
                 do {c2 <- get; return ['a', c2]})
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of (>>=) for ReadP
run (let R m' = (R (\k2 -> Get (\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
                                       in m'' k2))
                 >>= \_ ->
                 do {c2 <- get; return ['a', c2]}
                )
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- definition of (>>=) for ReadP
run (let R m' = R (\k3 -> (\k2 -> Get (\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
                                              in m'' k2))
                  (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                          in m''' k3))
     in m' (\x -> Result x Fail)
    ) "xczzz" =
-- pattern match the let expression
run ((\k3 -> (\k2 -> Get (\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
                                 in m'' k2))
     (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
             in m''' k3)) (\x -> Result x Fail)
    ) "xczzz" =
-- function application
run ((\k2 -> Get (\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
                         in m'' k2))
     (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
             in m''' (\x -> Result x Fail))
    ) "xczzz" =
-- function application
run (Get (\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
                 in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                                in m''' (\x -> Result x Fail)))
    ) "xczzz" =
-- apply the definition of run for the pattern match Get f
run ((\a2 -> let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) a2
             in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                            in m''' (\x -> Result x Fail))) 'x'
    ) "czzz" =
-- function application
run (let R m'' = (\u -> do {if ('x' ==) u then return u else pfail}) 'x'
     in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                    in m''' (\x -> Result x Fail))
    ) "czzz" =
-- function application
run (let R m'' = do {if ('x' ==) 'x' then return 'x' else pfail}
     in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                    in m''' (\x -> Result x Fail))
    ) "czzz" =
-- function application
run (let R m'' = do {return 'x'}
     in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                    in m''' (\x -> Result x Fail))
    ) "czzz" =
-- definition of return
run (let R m'' = R (\k4 -> k4 'x')
     in m'' (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                    in m''' (\x -> Result x Fail))
    ) "czzz" =
-- pattern match the let expression
run ((\k4 -> k4 'x') (\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
                             in m''' (\x -> Result x Fail))
    ) "czzz" =
-- function application
run ((\a3 -> let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) a3
             in m''' (\x -> Result x Fail)) 'x'
    ) "czzz" =
-- function application
run (let R m''' = (\_ -> do {c2 <- get; return ['a', c2]}) 'x'
     in m''' (\x -> Result x Fail)
    ) "czzz" =
-- function application
run (let R m''' = do {c2 <- get; return ['a', c2]}
     in m''' (\x -> Result x Fail)
    ) "czzz" =
-- do de-sugaring
run (let R m''' = get >>= (\c2 -> do {return ['a', c2]})
     in m''' (\x -> Result x Fail)
    ) "czzz" =
-- definition of get
run (let R m''' = (R Get) >>= (\c2 -> do {return ['a', c2]})
     in m''' (\x -> Result x Fail)
    ) "czzz" =
-- definition of (>>=) for ReadP
run (let R m''' = R (\k5 -> Get (\a4 -> let R m4 = (\c2 -> do {return ['a', c2]}) a4
                                        in m4 k5
                                )
                    )
     in m''' (\x -> Result x Fail)
    ) "czzz" =
-- pattern match the let expression
run ((\k5 -> Get (\a4 -> let R m4 = (\c2 -> do {return ['a', c2]}) a4
                         in m4 k5
                 )
     ) (\x -> Result x Fail)
    ) "czzz" =
-- function application
run (Get (\a4 -> let R m4 = (\c2 -> do {return ['a', c2]}) a4
                 in m4 (\x -> Result x Fail))
    ) "czzz" =
-- apply the definition of run for the pattern match Get f
run ((\a4 -> let R m4 = (\c2 -> do {return ['a', c2]}) a4
                 in m4 (\x -> Result x Fail)) 'c'
    ) "zzz" =
-- function application
run (let R m4 = (\c2 -> do {return ['a', c2]}) 'c'
     in m4 (\x -> Result x Fail)
    ) "zzz" =
-- function application
run (let R m4 = do {return ['a', 'c']}
     in m4 (\x -> Result x Fail)
    ) "zzz" =
-- definition of return
run (let R m4 = R (\k6 -> k6 ['a', 'c'])
     in m4 (\x -> Result x Fail)
    ) "zzz" =
-- pattern match the let expression
run ((\k6 -> k6 ['a', 'c']) (\x -> Result x Fail)) "zzz" =
-- function application
run ((\x -> Result x Fail) ['a', 'c']) "zzz" =
-- function application
run (Result ['a', 'c'] Fail) "zzz" =
-- apply the definition of run for the pattern match Result x p
(['a', 'c'], "zzz") : run Fail "zzz" =
-- apply the definition of run for the pattern match Fail (_)
(['a', 'c'], "zzz") : [] =
-- use the definition of (:)
[['a', 'c'], "zzz"]

Notice how the very first continuation, x -> Result x Fail, remained untouched throughout the whole transformation process. The actions were performed sequentially within the imperative looking do block and only at the very end the continuation was used to produce the final result.

Let’s run some code. The output below is produced by this script.

Summary

I really like the implementation of the ReadP library. The code is wonderfully elegant and is very well suited to show and study all the nice stuff Haskell is known for. The idea of starting with absolutely primitive operations (Get, Look, Result, Final and Fail) and build an expressive fully featured parser library out of them is really fascinating.

The concept of continuations is a very fascinating concept itself and worth knowing about. In this blog post it was shown how they are used in practice and what it means to make the further program flow explicit in a program.