What is the best way to parse html in PHP?

simple html code here.

<table> <tr><th>Name</th><th>Price</th><th>Country</th></tr> <tr><td><a href="bbb/111">Apple</a></td><td>500</td><td>America</td></tr> <tr><td><a href="bbb/222">Samsung</a></td><td>400</td><td>Korea</td></tr> <tr><td><a href="bbb/333">Nokia</a></td><td>300</td><td>Finland</td></tr> <tr><td><a href="bbb/444">HTC</a></td><td>200</td><td>Taiwan</td></tr> <tr><td><a href="bbb/555">Blackberry</a></td><td>100</td><td>America</td></tr> </table> 

What I want to do is break the name of the company and its price. like this.

 Apple 500 / Samsung 400 / Nokia 300 / HTC 200 / Blackberry 100 

So, I am using the php dom parser. I know that there are many php parser plugins, but people say it is better to use the original php parser. so i have code like this.

 $source_n = file_get_contents($html); $dom = new DOMDocument(); @$dom->loadHTML($source_n); $stacks = $dom->getElementsByTagName('table')->item(0)->textContent; echo $stacks; 

many string values โ€‹โ€‹will be shown .... like this.

 Name Price Country Apple 500 America Samsung 400 Korea ...... 

I think it is not useful to code, if I code as above, I should use the explode () function, and the code will be messier than now.

How can I break more elegantly? is there a simple link?

+6
source share
3 answers

Use DOMXPath::query , first collect all names

 $selector = new DOMXPath($dom); $results = $selector->query('//td/a'); foreach($results as $node) { echo $node->nodeValue . PHP_EOL; } 

Then prices after by changing

 $results = $selector->query('//td[2]'); 

Sandbox here

+3
source

The best solution I found for parsing html is to use the symfony Dom crawler component . Together with the css selector you can filter the HTML as you would a class in javascript. For example, to get all p elements, follow these steps:

 $crawler = $crawler->filter('body > p'); 
+1
source

If you do not want to use DOMXPath::query

 <?php $html = '<table> <tr><th>Name</th><th>Price</th><th>Country</th></tr> <tr><td><a href="bbb/111">Apple</a></td><td>500</td><td>America</td></tr> <tr><td><a href="bbb/222">Samsung</a></td><td>400</td><td>Korea</td></tr> <tr><td><a href="bbb/333">Nokia</a></td><td>300</td><td>Finland</td></tr> <tr><td><a href="bbb/444">HTC</a></td><td>200</td><td>Taiwan</td></tr> <tr><td><a href="bbb/555">Blackberry</a></td><td>100</td><td>America</td></tr> </table>'; $dom = new DOMDocument(); $dom->loadHTML($html); //Get tables $tables = $dom->getElementsByTagName('table'); //Get tr out of first table $tableRows = $tables->item(0)->getElementsByTagName('tr'); //iterate over tablerows foreach($tableRows AS $tableRow){ //Get tableData $tableData = $tableRow->getElementsByTagName('td'); //check to see if there is tableData if($tableData->length >0){ //Output first and second tableData echo $tableData->item(0)->nodeValue . " " . $tableData->item(1)->nodeValue . "<br>"; } } ?> 
0
source

Source: https://habr.com/ru/post/989887/


All Articles