Algorithm Name - Subtree Alignment in AST

I have a set S of "small" trees S[i] , which I need to find for larger ones , which are used as templates for finding matching subtrees in a larger tree T I know S before starting to create T (which is a parsing tree), so I’m thinking about using the secant plane method to match nodes along the way (since the parser generates CST).

The trees in S are not the same AST as T - think about XPath and XML - S stores the XPaths tree view, while T is the actual AST of the source code - I need maps between i and the vector of matching T nodes.

However, I'm not sure about the names of the algorithms that I would use.

Basically, I know what I want to do, it feels like “divide et impera for trees” with a stack where I hold possible candidates to match, each time I change the LALR analyzer I duplicate the top of the stack and remove the candidates i from S[i] which will not match, and after cutting I exit the stack. In the beginning, all participants from S are potential candidates.

Please note: this is just about AST, ASG is another story ...

Adding

There will be a parsing tree T

T - the parse tree

The parsing function will know a list of what I call "treepaths", in canonical form, also represented as trees stored in S But they will not look like parsetree, they have their own language in which they will be presented, similar to XPath.

An example of a tree-like path for obtaining all functions that have a variable as the return value:

 function[body[return[expr[@type="variable"]]]]] 
  • So what should I look for in existing literature?
  • Any other tips?
  • Are there languages ​​that meta-annotated trees can query for? The original C library (not C ++) would be ideal.
+6
source share
2 answers

1) Your S trees like XPath correspond to some T trees. Why not build T trees in advance and then map them?

2) If you want to map a template to a structure, you can imagine compiling the template into some kind of finite state machine, which goes over when specifying parts of the tree with which it is mapped. If the state machine ever enters an acceptance state, you have found a match. If you have more than one template, each of them can be considered as a finite state machine, and you can run them "in parallel" (by modeling). To make this effective, calculate the cross product of all state machines; now there is only one, and only one transition occurs at each entrance. I call this idea "exemplary products" and you see something like a lot of effective helpers. Close to what you want to do is Rete Algorithm , which keeps track of which “patterns” live when the data fed to it changes.

+3
source

Maybe it's worth a look at JXPath: http://commons.apache.org/jxpath/ I'm not sure what language you are targeting, but it might be worth the shot.

In any case, my first impulse, if I had to try to implement something like this, would be to find a way to “serialize” both trees and reduce the problem to a simple string matching.

0
source

Source: https://habr.com/ru/post/890672/


All Articles