Lpeg parses a first-order logical term

As the title says, I'm trying to analyze, for example,

term(A, b, c(d, "e", 7)) 

in a Lua table for example

 {term, {A, b, {c, {d, "e", 7}}}} 

This is the grammar I built:

 local pattern = re.compile[=[ term <- variable / function argument <- variable / lowercase /number / string function <- {|lowercase {|(open argument (separator (argument / function))* close)?|}|} variable <- uppercase lowercase <- {[az][A-Za-z0-9]*} uppercase <- {[AZ][A-Za-z0-9]*} string <- '"' {~ [^"]* ~} '"' number <- {[0-9]+} close <- blank ")" open <- "(" blank separator <- blank "," blank blank <- " "* ]=] 

I am having the following issues:

  • It cannot parse nested terms. In the above example, it returns only {term, {} } (while it works with term(A, b, c) ).
  • To remove quotes from the lines that I used {~ ~} , but because of this I had to move all captures from argument and term to the lines below. Is there any way to avoid this?
  • I would like to have a key associated with each element to indicate its type, for example, instead of A , something like {value = "A", type = "variable"} . I found a way to do this with {:name: :} , but the order of the elements in the table is lost (because it does not create a new table, but simply adds a key, in this case variable="A" and the order of this elements is not fixed). How can I tag elements that support order?
+4
source share
2 answers

In your grammar, you:

 argument <- variable / lowercase /number / string function <- {|lowercase {|(open argument (separator (argument / function))* close)?|}|} 

Keep in mind that lpeg is trying to match patterns / predicates in a rule in the order in which you have it. When it finds a match, lpeg will not consider further possible matches in this grammar rule, even if there might be a “better” match later.

Here it is not suitable for nested function calls, because it sees that c can match

 `argument <- variable` 

Since your variable non-terminal is specified before function , lpeg does not take the latter into account, and therefore it stops analyzing markers that appear after.

As an experiment, I changed your grammar a bit and added some table and ampere names for most non-terminals that interest you.

 local pattern = re.compile [=[ term <- {| {:type: '' -> "term" :} term_t |} term_t <- func / var func <- {| {:type: '' -> "func":} {:name: func_id:} "(" arg(separator arg)* ")" |} func_id <- lower / upper arg <- number / string / term_t var <- {| {:type: '' -> "var" :} {:name: lower / upper:} |} string <- '"' {~ [^"]* ~} '"' lower <- {%l%w*} upper <- {%u%w*} number <- {%d+} separator <- blank "," blank blank <- " "* ]=] 

With a quick template test:

 local test = [[fun(A, b, c(d(42), "e", f, 7))]] dump( pattern:match(test) ) 

Which gives the following output on my machine:

 { { { type = "var", name = "A" }, { type = "var", name = "b" }, { { "42", type = "func", name = "d" }, "e", { type = "var", name = "f" }, "7", type = "func", name = "c" }, type = "func", name = "fun" }, type = "term" } 

Having carefully considered the above, you will notice that the arguments to the function are displayed in the index part of the table in the order in which they were passed. OTOH type and name can be displayed in any order, since in the associative part of the table. You can wrap these “attributes” in another table and put this table of internal attributes in the index part of the external table.

Edit: Here is a revised grammar to make the analysis more uniform. I removed the term capture to help trim unnecessary branches.

 local pattern2 = re.compile [=[ term <- term_t term_t <- func / var func <- {| {:type: '' -> "func":} {:name: func_id:} "(" args? ")" |} func_id <- lower / upper arg <- number / string / term_t args <- arg (separator args)? var <- {| {:type: '' -> "var" :} {:name: lower / upper:} |} string <- {| {:type: '' -> "string" :}'"' {:value: [^"]* :} '"' |} lower <- {%l%w*} upper <- {%u%w*} number <- {| {:type: '' -> "number":} {:value: %d+:} |} separator <- blank "," blank blank <- " "* ]=] 

This gives the following:

 { { type = "var", name = "A" }, { type = "var", name = "b" }, { { { type = "number", value = "42" }, type = "func", name = "d" }, { type = "string", value = "e" }, { type = "var", name = "f" }, { type = "number", value = "7" }, type = "func", name = "c" }, type = "func", name = "fun" } 
+6
source

Sorry, I didn’t have any experience with LPeg, but regular Lua templates are enough to easily solve your problem:

 local str = 'term(A, b, c(d, "e", 7))' local function convert(expr) return (expr:gsub('(%w+)(%b())', function (name, par_expr) return '{'..name..', {'..convert(par_expr:sub(2, -2))..'}}' end )) end print(convert(str)) -- {term, {A, b, {c, {d, "e", 7}}}} 

Now just load() converted string to create the table.

+2
source

Source: https://habr.com/ru/post/1493709/


All Articles