Are there any good php libraries that can convert html / php documents to objects

I see a lot of php libraries that can parse html. A good example is QueryPath , which mimics jQuery Api.

However, I am looking for phtml analysis. Thus, the library not only parses the DOM well, but also parses the php processing instructions well. for example, the php or PDOM document object model.

A document like this:

 <?php require 'NameFinder.php'; $title = 'Wave Hello'; $name = getName(); ?><html> <head> <title><?php echo $title ?></title> </head> <body> <h1>Hello <?php echo $name ?></h1> <p>Blah Blah Blah</p> </body> 

I would like to use this php library to read things like:

  • internal html DOM node found using xpath or css selector.

as well as offering things like:

  • list of php functions / methods called in script
  • php variable values
  • pages required by this page
  • list of php variables used before line 5
  • list of php variables used before the 1st pair of body element

I could spend some time putting it together, borrowing code from things like phpdocumentor and Zend Framework Reflection , using the built-in Api DOM, introspection and string manipulation, etc.

But if there is some * phtmlQuery library that can do such things, then it will be convenient.

+4
source share
3 answers

To get processing instructions (and other nodes) from your files, you can use the DOM and XPath :

 $dom = new DOMDocument; $dom->loadHTMLFile('/path/to/your/file/or/url'); $xpath = new DOMXPath($dom); foreach ($xpath->query('//processing-instruction()') as $pi) { echo $dom->saveHTML($pi), PHP_EOL; } 

This will output:

 <?php require 'NameFinder.php'; $title = 'Wave Hello'; $name = getName(); ?> <?php echo $title ?> <?php echo $name ?> 

This will work with broken HTML. You can find additional libraries in

Once you receive the processing instructions, you can run them through your own Tokenizer or try some of them:

They will not magically give you the information you are looking for from the box, so you may have to write a few extra lines yourself.

+3
source

there is a built-in xml parser in the php core, but you can only use it on valid xhtml pages, not just regular html or broken xhtml. you will need to configure the parser to handle processing instructions, and this can become very complex.

http://www.php.net/manual/en/book.xml.php

http://www.php.net/manual/en/function.xml-set-processing-instruction-handler.php

0
source

You can use PHP token_get_all to tokenize PHP so that you can then execute the result and check the function calls and PHP values.

eg:.

 <?php $src = <<<EOD <?php require 'NameFinder.php'; $title = 'Wave Hello'; $name = getName(); ?><html> <head> <title><?php echo $title ?></title> </head> <body> <h1>Hello <?php echo $name ?></h1> <p>Blah Blah Blah</p> </body> EOD; $tokens = token_get_all($src); var_dump($tokens); 

You still need to write some code to see all the tokens, see what they are, and then get the value based on the type of token (function name, literal string, variable assignment, etc.), but this does LOT work for you since PHP parsing.

0
source

Source: https://habr.com/ru/post/1394902/


All Articles