Scala parser failure handling, hanging commas

Getting started with Scala parser combinations before moving on to better deal with error / error handling (note: you get into Scala anyway)

You want to parse strings, such as "a = b, c = d", into a list of tuples, but mark the user when hanging commas are found.

The idea of ​​matching failure ("a = b") when matching property assignments, separated by commas:

def commaList[T](inner: Parser[T]): Parser[List[T]] = rep1sep(inner, ",") | rep1sep(inner, ",") ~> opt(",") ~> failure("Dangling comma") def propertyAssignment: Parser[(String, String)] = ident ~ "=" ~ ident ^^ { case id ~ "=" ~ prop => (id, prop) } 

And call the parser with:

  p.parseAll(p.commaList(p.propertyAssignment), "name = John , ") 

which leads to an error, not surprisingly, but with:

  string matching regex `\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*' expected but end of source found 

The commList function successfully performs the first property assignment and begins to repeat the given comma, but the next "identifier" fails in the fact that the next character is the end of the source data. Thought I could catch that the second alternative in commList would match:

  rep1sep(inner, ",") ~> opt(",") ~> failure("Dangling comma") 

Knicks. Ideas?

+4
source share
2 answers

Scala to the rescue :-)

When you work with warnings, it is nice to exit the analyzer with an error. You can easily combine the parser with the monad of the writer Scalaz. Using these monads, you can add messages to a partial result during a parser run. These messages may be information, warnings, or errors. After the analyzer completes, you can check the result if it can be used or if it contains critical problems. With such a separate vaildator step, you get regular error messages. For example, you can accept arbitrary characters at the end of a line, but throw an error when they are found (for example, "Garbage found after the last statement"). The error message can be much more useful to the user than the critical default value that you will get in the example below ("matching regex` \ z lines is expected [...]").

Here is an example based on the code in your question:

  scala> :paste // Entering paste mode (ctrl-D to finish) import util.parsing.combinator.RegexParsers import scalaz._, Scalaz._ object DemoParser extends RegexParsers { type Warning = String case class Equation(left : String, right : String) type PWriter = Writer[Vector[Warning], List[Equation]] val emptyList : List[Equation] = Nil def rep1sep2[T](p : => Parser[T], q : => Parser[Any]): Parser[List[T]] = p ~ rep(q ~> p) ^^ {case x~y => x::y} def name : Parser[String] = """\w+""".r def equation : Parser[Equation] = name ~ "=" ~ name ^^ { case n ~ _ ~ v => Equation(n,v) } def commaList : Parser[PWriter] = rep1sep(equation, ",") ^^ (_.set(Vector())) def danglingComma : Parser[PWriter] = opt(",") ^^ ( _ map (_ => emptyList.set(Vector("Warning: Dangling comma"))) getOrElse(emptyList.set(Vector("getOrElse(emptyList.set(Vector("")))) def danglingList : Parser[PWriter] = commaList ~ danglingComma ^^ { case l1 ~ l2 => (l1.over ++ l2.over).set(l1.written ++ l2.written) } def apply(input: String): PWriter = parseAll(danglingList, input) match { case Success(result, _) => result case failure : NoSuccess => emptyList.set(Vector(failure.msg)) } } // Exiting paste mode, now interpreting. import util.parsing.combinator.RegexParsers import scalaz._ import Scalaz._ defined module DemoParser scala> DemoParser("a=1, b=2") res2: DemoParser.PWriter = (Vector(),List(Equation(a,1), Equation(b,2))) scala> DemoParser("a=1, b=2,") res3: DemoParser.PWriter = (Vector(Warning: Dangling comma),List(Equation(a,1), Equation(b,2))) scala> DemoParser("a=1, b=2, ") res4: DemoParser.PWriter = (Vector(Warning: Dangling comma),List(Equation(a,1), Equation(b,2))) scala> DemoParser("a=1, b=2, ;") res5: DemoParser.PWriter = (Vector(string matching regex `\z' expected but `;' found),List()) scala> 

As you can see, it handles errors very well. If you want to extend the example, add case classes for various types of errors and include the current parser positions in the messages.

Btw. the problem with spaces is handled by the RegexParsers class. If you want to change the space management, just override the whiteSpace field.

+4
source

Your parser does not expect a trailing space at the end of "name = John , " .

You can use the regular expression to optionally parse "," , followed by any number of spaces:

 def commaList[T](inner: Parser[T]): Parser[List[T]] = rep1sep(inner, ",") <~ opt(",\\s*".r ~> failure("Dangling comma")) 

Please note that here you can avoid the use of alternatives ( | ) by making part of the failure of the additional parser. If the extra part consumes some input and then fails, then the entire parser fails.

+1
source

Source: https://habr.com/ru/post/1497203/


All Articles