Preg_match select URL from another site

I want to select all the directory urls from this site .

I did pregmatch, but it retrieves the whole site URL, this also means unnecessary URLs.

Rendering, here is my code .

How to get all sending links from this site?

+3
source share
4 answers

I tried running this and it seems to work, just changed the regex

<?php
for($i=0;$i<=25;$i++){
    $site_url = "http://www.directorymaximizer.com/index.php?pageNum_directory_list=$i";
    $preg_math =  file_get_contents($site_url);
    $regex = '@-->(https?://[^<]*)<\!--@'; 
    preg_match_all($regex, $preg_math, $matches, PREG_PATTERN_ORDER); 

    foreach($matches as $key=>$val){
    if($val!="" && !is_numeric($val)){
        foreach(array_unique($val) as $key1=>$val1){
            if( $val1!="" && !is_numeric($val1)){

             echo $val1;
             echo "<br />\n";

            }
        }   
    }
}
}
0
source

For this you need an HTML parser . HTML is irregular, so regular expressions do not work well.

0
source

.

RegExr gskinner.com.

, , , . URL-, . URL, . . .

preg_match ( '/HTTP:?.? \/\/( [-z0-9/] +\[\ W] +), ( [?. \/\\\=\&] +)?) [\ s\w = "] + > /', $site, $anchors);

$url = $anchors ['url'];
$ domain = $anchors ['domain'];
$ path = $anchors ['path'];

, . , , .

0

, . , URL- , , :

target="_blank">-->the url is here<!--</a>-->

, :

@target="_blank">-->(?P<url>.+?)<!--</a>-->@

If matches from the first capture group, indexed under "url", will contain URLs - surprise. Why named capture group? It just seems that it’s easier for you to understand what you are doing when you look back at your code.

0
source

Source: https://habr.com/ru/post/1730329/


All Articles