anyBetween start end = start *> anyTill end
Your anyBetween parser eats its last character, because anyTill does - it is intended to be parsed to the final marker, but on the condition that you do not want the closing curly bracket on the input to be parsed again.
Note that your end parsers are all character parsers, so we can change the functionality to use this:
anyBetween'' start ends = start *> many (satisfy (not.flip elem ends))
but many not as efficient as Attoparsec takeWhile , which you should use as much as possible, so if you did
import qualified Data.Attoparsec.Text as A
then
anyBetween' start ends = start *> A.takeWhile (not.flip elem ends)
gotta do the trick and we can rewrite
styleWithoutQuotes = anyBetween' (stringCI "style=") [' ','>']
If you want him to eat ' ' , but not '>' , you can explicitly use spaces afterwards:
styleWithoutQuotes = anyBetween' (stringCI "style=") [' ','>'] <* A.takeWhile isSpace
Switch to more takeWhile
Perhaps styleWithQuotes could do with rewrite to use takeWhile , so let's make two helpers on anyBetween lines. They take from the initial parser to the final character and include all-encompassing and exclusive versions:
fromUptoExcl startP endChars = startP *> takeTill (flip elem endChars) fromUptoIncl startP endChars = startP *> takeTill (flip elem endChars) <* anyChar
But I think, from what you said, you want styleWithoutQuotes be a hybrid; he eats, but not > :
fromUptoEat startP endChars eatChars = startP *> takeTill (flip elem endChars) <* satisfy (flip elem eatChars)
(All of them assume a small number of characters in your final character lists, otherwise elem inefficient - there are several options for Set if you are checking a large list, for example, the alphabet.)
Now to rewrite:
styleWithQuotes' = fromUptoIncl (stringCI "style=\"") "\"" styleWithoutQuotes' = fromUptoEat (stringCI "style=") " >" " "
Generic Parser
everythingButStyles uses <|> in such a way that if it does not find "style" , it will back off and then take everything. This is an example of what can be slow. The problem is that we fail - at the end of the input line, that is a bad time to make a choice about whether we can fail. Release everything and try
Idea: take it until we get s, and then skip the style, if any.
notStyleNotEvenS = takeTill (flip elem "sS") skipAnyStyle = (styleWithQuotes' <|> styleWithoutQuotes') *> notStyleNotEvenS <|> cons <$> anyChar <*> notStyleNotEvenS
anyChar usually s or s , but there is no point in checking again.
noStyles = append <$> notStyleNotEvenS <*> many skipAnyStyle parseNoStyles = parseOnly noStyles