Saving file offsets when parsing HTML using the DOM?
I want to change attributes <img src="">in not too distorted HTML (WordPress posts). I know that I can take a simple way and use regular expressions, but I'm afraid that people in blue fluffy suits will come to me in a dream .
If I use the DOM parser to read the HTML code and modify the tags <img>, I’m afraid that I can’t restore the post exactly as it was (only with my modification), because the DOM parser will probably do too much cleaning and maybe , delete important data. The SAX parser probably cannot process invalid XML, so this will not work either.
So, is there a middle way where I can use the DOM parser, but one that knows where each element was launched, so I can do a string replacement or something similar from there? I know that some nodes in the DOM tree will not exist in the source document (it <b>Some <i>bizarre</b> formatting</i>will probably call it), but does this mean that this is always impossible? I see that in PHP 5.3 there is a function DOMNode::getLineNo(), but I am using 5.2.x.
If the PHP DOM writes too clean results, you can try String -based SimpleHTMLDOM
, , , , , "". , , .
DOM DOMNode getLineNo(). , , , , . , .