Implementation of Read typeclass where parsing lines include "$"

Question

Implementation of Read typeclass where parsing lines include "$"

I play with Haskell for about a month. For my first “real” Haskell project, I am writing a speech part tagger. As part of this project, I have a type called Tag , which represents a tag of a part of speech, implemented as follows:

 data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ...

The above is a long list of standardized part-of-speech tags that I intentionally truncated. However, in this standard set of tags, there are two that end with a dollar sign ($): PRP $ and NNP $. Since I cannot have type constructors with $ in their name, I decided to rename them to PRPS and NNPS.

This is good and good, but I would like to read tags from strings in the lexicon and convert them to my Tag type. Trying this fails:

 instance Read Tag where readsPrec _ input = (\inp -> [((NNPS), rest) | ("NNP$", rest) <- lex inp]) input

Haskell Laker is choking on $. Any ideas how to do this?

The implementation of the show was quite simple. It would be great if there was some kind of similar strategy for Read.

 instance Show Tag where showsPrec _ NNPS = showString "NNP$" showsPrec _ PRPS = showString "PRP$" showsPrec _ tag = shows tag

+6

haskell linguistics

svoisen 15 sept. '11 at 23:26

source share

2 answers

Do not use a Haskell lexer. The read functions use ParSec, and you can find a great introduction to the book Real World Haskell.

Here is the code that works,

 import Text.Read import Text.ParserCombinators.ReadP hiding (choice) import Text.ParserCombinators.ReadPrec hiding (choice) data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS deriving (Show) strValMap = map (\(x, y) -> lift $ string x >> return y) instance Read Tag where readPrec = choice $ strValMap [ ("CC", CC), ("CD", CD), ("JJ$", JJS) ]

just run it with

 (read "JJ$") :: Tag

The code is pretty clear. The string x monad parser matches x , and if it succeeds (does not throw an exception), y returned. We use choice to choose among all of these. It will return accordingly, so if you add the CCC constructor, then a CC that partially matches the "CCC" will come out later and it will return to CCC . Of course, if you don't need it, use the <|> combinator.

+4

gatoatigrado 15 sept. '11 at 23:57

source share

ivanm · Accepted Answer · 2011-09-15T23:43:44+0000

You abuse Read here.

Show and Read are for printing and analyzing actual Haskell values, for enabling debugging, etc. This is not always ideal (for example, if you import Data.Map and then call Show on Map , the fromList call fromList not specified), but it is a valid starting point.

If you want to print or analyze your values in accordance with a specific format, use the most beautiful library for the first and the actual parsing library (for example, uu-parsinglib, polyparse, parsec, etc.). They usually have much better parsing support than ReadS (although ReadP in GHC is not so bad).

While you can argue that this is not necessary, it’s just a quick dirty hack that you do, fast dirty hacks tend to linger ... do yourself a favor and do it right the first time: it means there’s less rewriting when you want to do it “right” later.

Implementation of Read typeclass where parsing lines include "$"

More articles: