Parsing arguments without separating spaces

In smali, the signature of a method that takes two integers and returns a single integer is written like this:

add(II)I 

To parse this with xtext, I tried the following:

 ID'('Type*')'Type 

Unfortunately, this only works with a space between two I

How can I change the rule so that it does not insist on a space here?


As far as I can see, this should already be a problem when the lexer processes terminal rules. Whenever he sees a sequence of characters such as III , he always marks it as an identifier immediately. - Regardless of the situation .: (

To parse something like:

 III(III)I 

i.e. a function called III , which takes three integers and returns another integer, it seems that I need to make the lexer always select only individual characters and reassemble it using the parser rule.

But in this case, I can no longer create the ID rule ...

I think I missed something important.


NB: In addition to primitive data types such as I (integer), D (double) and V (void), there are also class types written as Ljava/lang/String; and arrays starting with [ .

A typical main method looks like .method public static main([Ljava/lang/String;)V

+4
source share
2 answers

You can try to configure the mwe2 workflow that your language generates to use the advanced AntlrGeneratorFragment , where you can set it, use backtracking in lexer. This should do the trick. You must do the same for the content parsing fragment, where you find ContentAssistParserGeneratorFragment .

Some background: a lexer, as a rule, consumes the longest matching sequence, for example. III looks like an identifier, so it will be consumed as one identifier, and not three separate I tokens. If backtracking is enabled, it breaks it, instead consuming the full identifier. This can cause some difficulties if III is not always a list of types, but sometimes a real identifier, but you can bypass them using the data type rule for valid identifiers.

+1
source

You can try this with a refund, but I usually avoid this technique. This can lead to very confusing error messages and can lead to a very slow parser.

Try the following approach:

  • Parse the parameter string ("III") as an identifier
  • Add a validator, limiting it only to "I" along with a good error message (see AbstractInjectableValidator , xText will have a genrated validator for your language, probably called "SmaliJavaValidator").
  • Extend the EObject representing the type string, so it will break the string in the descriptions of individual types (for example, single "I")

With this approach, you parse a type string until xText has finished its grammar. You get a useful result, with fast grammar and a good error message.

General recommendation: Usually I usually make my grammar completely valid and limit the result later to validators. Thus, the grammar remains fast, and the user receives good user messages with taylored errors.

+1
source

Source: https://habr.com/ru/post/1497710/


All Articles