Find an element in html and explode it for stocks

I want to get the HTML element on the page.

<h2 id="resultCount" class="resultCount">

    <span>

        Showing 1 - 12 of 40,923 Results

    </span>

</h2>

I need to get the total number of test results in my php.

Currently, I get everything between the h2 tags, and for the first time I explode with a space. Then I explode again with a comma, so that concatenation is able to convert the results of numbers to the European format. Once everything is done, I will check the results of my number.

define("MAX_RESULT_ALL_PAGES", 1200);    
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
                $htmlResultCountPage = file_get_html($queryUrl);
                $htmlResultCount = $htmlResultCountPage->find("h2[id=resultCount]");
                $resultCountArray = explode(" ", $htmlResultCount[0]);

                $explodeCount = explode(',', $resultCountArray[5]);
                  $europeFormatCount = '';
                  foreach ($explodeCount as $val) {
                           $europeFormatCount .= $val;
                   }
                if ($europeFormatCount > MAX_RESULT_ALL_PAGES) {*/

                    $queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;

                }

Currently, the total number of results is not very well restored, and the condition does not occur even when it should be.

Will someone have a solution to this problem or in any other way?

+4
source share
4 answers

( html) , . :

define('MAX_RESULT_ALL_PAGES', 1200);

$queryUrl    = AMAZON_TOTAL_BOOKS_COUNT . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
$queryResult = file_get_contents($queryUrl);

if (preg_match('/of\s+([0-9,]+)\s+Results/', $queryResult, $matches)) {
    $totalResults = (int) str_replace(',', '', $matches[1]);
} else {
    throw new \RuntimeException('Total number of results not found');
}

if ($totalResults > MAX_RESULT_ALL_PAGES) {
    $queryUrl = AMAZON_SEARCH_URL . $searchMonthUrlParam . $searchYearUrlParam . $searchTypeUrlParam . urlencode($keyword) . '&page=' . $pageNum;
    // ...
}
+1

:

...
preg_match("/of ([0-9,]+) Results/", $htmlResultCount[0], $matches);
$europeFormatCount = intval(str_replace(",", "", $matches[1]));
...
0

Please try this code.

define("MAX_RESULT_ALL_PAGES", 1200);  

// new dom object
$dom = new DOMDocument();

// HTML string
$queryUrl = AMAZON_TOTAL_BOOKS_COUNT.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
$html_string = file_get_contents($queryUrl);

//load the html
$html = $dom->loadHTML($html_string);

//discard white space 
$dom->preserveWhiteSpace = TRUE;

//Get all h2 tags
$nodes = $dom->getElementsByTagName('h2');

// Store total result count
$totalCount = 0;

// loop over the all h2 tags and print result
foreach ($nodes as $node) {
    if ($node->hasAttributes()) {
        foreach ($node->attributes as $attribute) {
            if ($attribute->name === 'class' && $attribute->value == 'resultCount') {
                $inner_html = str_replace(',', '', trim($node->nodeValue));
                $inner_html_array = explode(' ', $inner_html);

                // Print result to the terminal 
                $totalCount += $inner_html_array[5];
            }
        }
    }
}

// If result count grater than 1200, do this
if ($totalCount > MAX_RESULT_ALL_PAGES) {
      $queryUrl = AMAZON_SEARCH_URL.$searchMonthUrlParam.$searchYearUrlParam.$searchTypeUrlParam.urlencode($keyword)."&page=".$pageNum;
}
0
source

Try:

$match =array();
preg_match('/(?<=of\s)(?:\d{1,3}+(?:,\d{3})*)(?=\sResults)/', $htmlResultCount, $match);
$europeFormatCount = str_replace(',','',$match[0]);

RegEx reads a number between "from" and "results", it matches numbers with a separator ",".

0
source

Source: https://habr.com/ru/post/1523354/


All Articles