Is it possible to analyze the list of directories on an external web page?

Is it possible to analyze the directory list of a webpage that is external if the webpage is accessible and it displays a list of files when accessing it. I just want to know if it is possible to dynamically parse files in PHP and how? -to you

Sorry I do not know. I mean a list of directories, such as: http://www.ibiblio.org/pub/ (index / ..) and the ability to read contents as an array or something easy to manage in my script

+6
source share
4 answers

You can use preg_match or DomDocument

In your case:

 $contents = file_get_contents("http://www.ibiblio.org/pub/"); preg_match_All("|href=[\"'](.*?)[\"']|", $contents, $hrefs); var_dump($hrefs); 

If you want to take a look at a working demo .

+4
source

If you are returning a list of directories full of links in the corresponding XHTML document, you can use the DOMDocument and code like the following to get the list of files:

 $doc = new DOMDocument(); $doc->preserveWhitespace = false; $doc->load('directorylisting.html'); $files = $doc->getElementsByTagName('a'); 

$files now a DOMElement list, which you can DOMElement through and get the href attribute to get the full path to the files in the list.

Note that this approach requires a well-formed directory list returned from the server. You cannot, for example, execute a request on stackoverflow.com and get a list of files in a directory.

If this does not work (possibly incorrect HTML), you can use regular expressions (e.g. preg_match_all ) to find the <a tags, for example:

 preg_match_all('@<a href\="([a-zA-Z\.\-\_\/ ]*)">(.*)</a>@', file_get_contents('http://www.ibiblio.org/pub/'), $files); var_dump($files); 

$files will still match elements, just a collection of arrays.


UPDATE, I tested your URL ( http://www.ibiblio.org/pub/ ) and it works fine ( preg_match_all method).

+2
source

Yes, it is very possible. I don’t quite understand what you mean by directory, but you should research website crawlers. This is essentially what you are asking for, but written in PHP.

0
source

PHP file_get_content will do the trick for you.

(Assuming your HTTP request for this page returns a list of files, as you mentioned)

0
source

Source: https://habr.com/ru/post/893247/


All Articles