Parsing arguments without separating spaces

Question

Parsing arguments without separating spaces

In smali, the signature of a method that takes two integers and returns a single integer is written like this:

add(II)I

To parse this with xtext, I tried the following:

 ID'('Type*')'Type

Unfortunately, this only works with a space between two I

How can I change the rule so that it does not insist on a space here?

As far as I can see, this should already be a problem when the lexer processes terminal rules. Whenever he sees a sequence of characters such as III , he always marks it as an identifier immediately. - Regardless of the situation .: (

To parse something like:

 III(III)I

i.e. a function called III , which takes three integers and returns another integer, it seems that I need to make the lexer always select only individual characters and reassemble it using the parser rule.

But in this case, I can no longer create the ID rule ...

I think I missed something important.

NB: In addition to primitive data types such as I (integer), D (double) and V (void), there are also class types written as Ljava/lang/String; and arrays starting with [ .

A typical main method looks like .method public static main([Ljava/lang/String;)V

+4

eclipse parsing xtext

michas Aug 18 '13 at 23:37

source share

2 answers

Sebastian zarnekow · Answer 1 · 2013-08-24T14:44:32+0000

You can try to configure the mwe2 workflow that your language generates to use the advanced AntlrGeneratorFragment , where you can set it, use backtracking in lexer. This should do the trick. You must do the same for the content parsing fragment, where you find ContentAssistParserGeneratorFragment .

Some background: a lexer, as a rule, consumes the longest matching sequence, for example. III looks like an identifier, so it will be consumed as one identifier, and not three separate I tokens. If backtracking is enabled, it breaks it, instead consuming the full identifier. This can cause some difficulties if III is not always a list of types, but sometimes a real identifier, but you can bypass them using the data type rule for valid identifiers.

stefan.schwetschke · Answer 2 · 2013-08-27T07:55:40+0000

You can try this with a refund, but I usually avoid this technique. This can lead to very confusing error messages and can lead to a very slow parser.

Try the following approach:

Parse the parameter string ("III") as an identifier
Add a validator, limiting it only to "I" along with a good error message (see AbstractInjectableValidator , xText will have a genrated validator for your language, probably called "SmaliJavaValidator").
Extend the EObject representing the type string, so it will break the string in the descriptions of individual types (for example, single "I")

With this approach, you parse a type string until xText has finished its grammar. You get a useful result, with fast grammar and a good error message.

General recommendation: Usually I usually make my grammar completely valid and limit the result later to validators. Thus, the grammar remains fast, and the user receives good user messages with taylored errors.

Parsing arguments without separating spaces

More articles: