Chapter 10 is legendary, where the book entirely runs off the rails with what should be a simple topic, namely writing a parser for the header line of a PGM file. The chapter plunges headlong into a variety of topics the relevance of which only becomes clear in later chapters. Chapter 14 in the book should really follow chapter 10 (probably the book should be rearranged so chapter 10 becomes chapter 13). What is going on in chapter 10 is that the author is inventing Monads, without telling you so and frightening you up front. You don't learn this until you read chapter 14 (which is pretty good honestly).
I don't think any newcomer to Haskell stands much of a chance learning anything from this chapter. The writer sets out to run a marathon and simply shouts over his shoulder "try to keep up".
The chapter plunges headlong into a variety of topics, entirely without explanation. The relevance of the bizarre things the author does in chapter 10 only become clear in later chapters, particularly chapter 14. Chapter 14 in the book should really follow chapter 10. The book should be rearranged so chapter 10 becomes chapter 13.
What is going on in chapter 10 is that the author is inventing Monads. But he isn't telling you this, for fear of frightening you. What he does though is far worse. My advice is this: skim both chapter 10 and chapter 14 first. Get a grip on a basic skeleton of what Monads are about, then tackle chapter 10, with the help of this page and the following excellent set of notes by Ulas Tuerkmen.
P5 3072 2048 255We just have a two byte "magic number" to start and identify the file, followed by 3 numbers, then a big blob of binary data follows. The numbers can be separated by any amount of whitespace (inluding newlines). However only one whitespace character is allowed between the last number and the binary blob, so a bit of care is required here. Here is a quickly thrown together ruby script to parse a PGM file:
#!/bin/ruby # Let's see how much trouble it is in ruby to # parse a PGM file and compare to the ch10 Haskell code. def error ( msg ) puts msg exit end pgm_file = "test.pgm" file = File.open(pgm_file, "rb") contents = file.read file.close stuff = contents.split " ", 4 error "Not a PGM file 1" if stuff.size != 4 (magic,w,h,rem) = stuff error "Not a PGM file 2" if magic != "P5" # After the "max" value, we get only a single whitespace # character, so we can't just include the max value in the split above. md = /(\d+)\s/.match rem max = md[1] off = max.length + 1 bits = rem[off,rem.length-off] print magic + " #{w} by #{h}, max = #{max}, size = #{bits.length} bytes\n" # THE ENDIt is worth bearing in mind this simple imperative program as you plow through the complexities of chapter 10. It makes a person wonder if Haskell really is suited for doing real world programming.
The complexity in chapter 10 comes in by the use of "Maybe" to perform a basic sort of error handling. This is improved later in the code by changing the use of "Maybe" to "Either" to allow an error message to be propogated rather than a simple notion of failure. The trick then is to propogate the possible "Nothing" that can arise at various states of the parse and return it as the result.
As it turns out, both Maybe and Either are instances of the much feared Monads that you hear so much about. I find it much easier to understand chapter 10, or at least to fathom why the writer does some of the things he does by understanding some basic things about Monads up front. You can get all this by reading parts of chapter 14 before diving into chapter 10, but here is a basic orientation
Notice that nothing has been said here about what monads are in any general or precise way. The goal in chapter 10 would seem to be to get some experience working with monads without realizing you are doing so. Sort of like tricking a child who has a mental block against eating vegatables into eating his beans.
We really should stop here and resume discussion of chapter 10, but it seems pertinent at this point to talk about some standard monad jargon that you are bound to run into.
This is their example of bad code that needs to be improved. One mystery here that becomes an interest of mine is the absence of a skipSpace routine which appears in their next example. If you look closely at the matchHeader routine, you will see that it embeds skipSpace and each call to getNat has the skipSpace code on the line that calls the function. Curiously after reading "max", we do not call skipSpace, but use getBytes to skip a single character. No explanation of this.
Also I find the call to "fromIntegral" curious, perplexing, and needless, but perhaps they have their reasons. An explanation would be nice.
Ulas points out that there is some dishonest trickery going on in this code involving the lambdas. The book makes a lame excuse for them, but actually they are vital to pull the height, width, and other parsed items out of the code chain so they will available for the final call to the Greymap constructor. Ulas also points out that these lambdas are not complete on one line, but all are essentially dangling. A more honest rendition of the code with truthful indentation would be:
parseP5 s = matchHeader (L8.pack "P5") s >>? skipSpace >>? \(_,s) -> getNat s >>? skipSpace >>? \(w,s) -> getNat s >>? skipSpace >>? \(h,s) -> getNat s >>? \(max,s) -> getBytes 1 s >>? (getBytes (w*h) . snd) >>? \(map,s) -> Just (Greymap w h max map, s)This is a subtle but very significant aspect of this code. If this isn't clear to you, ponder how the value "w" makes it from the pattern in the lambda that matches for it to the Greymap call. Read what Ulas has to say -- and try his experiment of wrapping one of these lambdas in parenthesis. The business of how Haskell operators bind and precedence can be a confusing and tricky subject, as it is here.
Here each lambda runs all the way to the end of the parseP5 function, and it must for the variables like w, h, and so on to be available to the Greymap call.
I have another nit to pick as well. In a well written parser, the skipSpace routine ought to return Nothing if it fails to find whitespace. This would be a fairly trivial code change.
The following is a yet more simple parser that plucks apart a line with two numbers. It fixes the skipSpace routine so it ensures that there is whitespace. It still uses the same brittle lambda chaining as does example 2 in the book. It uses the standard Monad bind operator >>=. My attempts to rewrite this using do syntax have failed thus far.
I would even go so far as to say that Monads are the very heart of Haskell, for anyone who wants to go beyond the basic level. You can't even write "hello world" without tripping over the IO monad. Monads are quite unique to the Haskell language. Getting comfortable with Monads separates the men from the boys; the neophytes from the veterans; the dogs from the cats; the bulls from the frogs.
Chapter 10 concludes with a third version of the parser, the heart of which is shown on page 252. I have to agree with (and laugh about) what Ulas says about this code in his commentary on chapter 10:
To be perfectly honest, this is some of the ugliest and most incomprehensible code I have ever seen, and I don't understand how it could be an improvement over anything.My approach was to type in code, beginning with the parseRawPGM function on page 252 and then keep hunting down missing pieces and adding them, studying them as I went. I used code Ulas wrote as the inspiration for my main and wrapper and kept fixing errors and typos until I had something that worked. There are a huge number of new and poorly explained concepts here and they are worth the time to study.
newtype Parse a = Parse { runParse :: ParseState -> Either String (a, ParseState) }
Tom's Computer Info / tom@mmto.org