Selectively retrieving data from an external site using the PHP DOM web crawler

I have this php browser that works great. he retrieves the mentioned tag along with his link from the (external) forum site to my page.

But lately I have run into a problem. how

This is the HTML code for the forum data ::

<tbody>
<tr>
    <td width="1%" height="25">&nbsp;</td>
    <td width="64%" height="25" class="FootNotes2"><a href="/files/forum/2017/1/837880.php" target="_top" class="Links2">Hispanic Study Partner</a> - dreamer1984</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="14%" height="25" class="FootNotes2" align="center">02/28/17 01:42</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="8%" height="25" align="Center" class="FootNotes2">0</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="9%" height="25" align="Center" class="FootNotes2">200</td>
</tr>
<tr>
    <td width="1%" height="25">&nbsp;</td>
    <td width="64%" height="25" class="FootNotes2"><a href="/files/forum/2017/1/837879.php" target="_top" class="Links2">nbme</a> - monariyadh</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="14%" height="25" class="FootNotes2" align="center">02/27/17 23:12</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="8%" height="25" align="Center" class="FootNotes2">0</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="9%" height="25" align="Center" class="FootNotes2">108</td>
</tr>
</tbody>

Now, if we consider the above code (table data) as the only statements available on this site. and if I tried to extract it using a web crawler for example

<?php
    require_once('dom/simple_html_dom.php'); 
    $html = file_get_html('http://www.sitename.com/');
    foreach($html->find('td.FootNotes2') as $element) {
    echo $element;
}
?>

It retrieves the data inside, with the class name as "FootNote2"

Now, if I want to extract certain data into a tag, for example names like "dreamer1984" and "monariyadh" from the first tag / line.

, 3- ( ), .

, " ",

preg_match_all('/<td.+?FootNotes2.+?<a.+?<\/a> - (?P<name>.*?)<\/td>.+?<td.+?FootNotes2.+?(?P<date>\d{2}\/\d{2}\/\d{2} \d{2}:\d{2})/siu', $subject, $matchs);

foreach ($matchs['name'] as $k => $v){
    var_dump('name: '. $v, 'relative date: '. $matchs['date'][$k]);
}

DOM... .

+4
3

, , , td, :

require_once('dom/simple_html_dom.php'); 
$html = file_get_html('http://www.sitename.com/');
foreach ($html->find("tr") as $row) {
        $element = $row->find('td.FootNotes2',0);
        if ($element == null) { continue; }
        $textNode = array_filter($element->nodes, function ($n) {
            return $n->nodetype == 3;        //Text node type, like in jQuery     
        });

        if (!empty($textNode)) {
            $text = current($textNode);
            echo $text;         
        }

    }  

:

- dreamer1984
- monariyadh

, .

, td tr.

+2

( )

foreach ($html->find("td.FootNotes2") as $element) {

    $children = $element->children; // get an array of children
    foreach ($children AS $child) {
      $child->outertext = ''; // This removes the element, but MAY NOT remove it from the original $myDiv
    }
    echo $element->innertext."<br>";
}

/

- dreamer1984
02/28/17 01:42
0
200
- monariyadh
02/27/17 23:12
0
108
0

, :

foreach($html->find('tr') as $tr) {
  echo preg_replace('/.* - /', '', $tr->find('td',1)->text()) . "\n";
  echo $tr->find('td',3)->text() . "\n";
}

apokryfos , - .

0
source

Source: https://habr.com/ru/post/1671112/


All Articles