Simple web page wrapping in PHP

To understand this from the very beginning, I have full agreement to make this the administrator of the website until they build the API.

What I want to do is get, say, the number or any part of the data found in a specific part of the site, although this can lead to a change.

An example of what I want to do if I have to store html in a variable through file_get_contents and want to find somewhere in the source where it says "<p>User status: Online.</p>" ; I would need to save the text between "status: " and ".</p>" in a variable, only knowing these two lines to find it, but also knowing that there is only one possible scenario when these two texts are in the same line

thank you for your time

EDIT: I seem to have forgotten about the most important part of this. Well, the question is how to do what I just described, if you have a lot of text, how can I find that between one part of the text and another part of the text, and save it in a variable?

+5
source share
1 answer

There are several ways to clean websites, one could use CSS Selectors and the other use XPath , which both select elements from the DOM.

Since I do not see the full HTML page of the web page, it would be difficult for me to determine which method is best for you. There is another option that may be disapproved, but in this case it may work.

You can use regex (regular expressions) to find characters, I'm not the best in regular expressions, but here are some sample code on how this might work:

 <?php $subject = "<html><body><p>Some User</p><p>User status: Online.</p></body></html>"; $pattern = '/User status: (.*)\<\/p\>/'; preg_match($pattern, $subject, $matches); print_r($matches); ?> 

Output Example:

 Array ( [0] => User status: Online.</p> [1] => Online. ) 

Basically, what the regular expression does above matches the pattern, in this case it searches for the string "Custom Status:" and then matches all the characters (. *) Up to the end paragraph tag (escaped).

Here is a template that will only return β€œonline” without a period, was not sure that all statuses ended during the period, but here is what it will look like:

 '/User status: (.*)\.\<\/p\>/' 
+4
source

Source: https://habr.com/ru/post/1272244/


All Articles