Is the html php document faster or is the preg_match_all function faster?

Question

Is the html php document faster or is the preg_match_all function faster?

I have a doubt about which one is faster to process?

dom document or preg_match_all with curl function faster in parsing html pages? and will the dom document function leave a trace on another server, like the curl do function? For example, in the curl function, we use the user agent to determine who has access, but there is nothing in the dom document.

+4

dom php

mathew Jan 03 '11 at 10:21

source share

3 answers

Andy lester · Answer 1 · 2011-01-03T22:30:20+0000

Does it matter which is faster if you give the wrong results?

Matching with regular expressions to get one bit of data from a document will be faster than parsing the entire HTML document. But regular expressions cannot parse HTML correctly in all cases.

See http://htmlparsing.com/regexes.html , which I started looking at in this common question. (And for the rest of you reading this, I can use the help. The source is on github, and I need examples for different languages.)

Gordon · Answer 2 · 2011-01-03T22:37:43+0000

Regular expressions are likely to be faster, but they are also probably the worst choice. If you did not compare and profile your application and did not find anything else to optimize, you should look at the corresponding existing parser.

While regular expressions can be used to match HTML, it is working hard to find a reliable parser . PHP offers a bunch of native extensions for working with XML (and HTML) reliably. There are also a number of third-party libraries. See my answer to

Best HTML Parsing Techniques

As for sending a user agent, this is also possible with the DOM. You must create a custom thread context and attach it to the base libxml function . You can provide any of the available HTTP stream context options this way . See my answer to

DOMDocument :: validate () problem

for an example of how to provide a custom UserAgent.

The surrican · Answer 3 · 2011-01-03T22:27:49+0000

The dom functions have nothing to do with fetching html.

however, there are download functions that you can use to directly extract http resources.

they will show the same behavior as file_get_contents without context parameters.

regarding another part of your question. preg functions faster. however, they are not intended for this use, and you will probably regret using them for this purpose soon.

if you parse html with regular expressions, you are either completely insane nuts, or you just don't get the concept of html.

Is the html php document faster or is the preg_match_all function faster?

More articles: