How to get the "text" of an html page? (Webbrowser - Delphi)

Question

How to get the "text" of an html page? (Webbrowser - Delphi)

I am using WebBrowser to get the source of html pages. Our page source has text and some html tags. eg:

FONT&gt;&lt;/P&gt;&lt;P align=center&gt;&lt;FONT color=#ccffcc size=3&gt;**Hello There , This is a text in our html page** &lt;/FONT&gt;&lt;/P&gt;&lt;P align=center&gt; &lt;/P&gt;

HTML tags are random, and we cannot guess them. So is there a way to get only texts and separate them from html tags?

+3

html browser text delphi

Kermia Sep 08 '10 at 9:41

source share

5 answers

You should look at the Delphi DOM HTML parser

+2

irishbuzz Sep 08 '10 at 9:43

source share

, everychar **. , (, < >. DOM- . p >

+1

Svisstack 08 . '10 9:45

: .

HTML - , ( , - , , ). , .

HTML, :

HTML string, HTML.

HTML , , Indy (. ).

HTML , .

TWebBrowser, RRuz, Internet Explorer.
Windows , Internet Explorer ...

-

+1

Jeroen Wiert Pluimers 08 . '10 12:29

Using the Delphi HTML component library that retrieves text only from an HTML document is simple. The THtDocument.InnerText property returns formatted text without tags.

0

Alexander Sviridenkov Apr 15 '15 at 21:05

source share

Rruz · Accepted Answer · 2010-09-08T10:16:00+0000

you can use an instance of TWebBrowser to parse and select plugin text from html code.

see this sample

uses
MSHTML,
SHDocVw,
ActiveX;

function GetPlainText(Const Html: string): string;
var
DummyWebBrowser: TWebBrowser;
Document       : IHtmlDocument2;
DummyVar       : Variant;
begin
   Result := '';
   DummyWebBrowser := TWebBrowser.Create(nil);
   try
     //open an blank page to create a IHtmlDocument2 instance
     DummyWebBrowser.Navigate('about:blank');
     Document := DummyWebBrowser.Document as IHtmlDocument2; 
     if (Assigned(Document)) then //Check the Document
     begin
       DummyVar      := VarArrayCreate([0, 0], varVariant); //Create a variant array to write the html code to the  IHtmlDocument2
       DummyVar[0]   := Html; //assign the html code to the variant array
       Document.Write(PSafeArray(TVarData(DummyVar).VArray)); //set the html in the document
       Document.Close;
       Result :=(Document.body as IHTMLBodyElement).createTextRange.text;//get the plain text
     end;
   finally
     DummyWebBrowser.Free;
   end;
end;

How to get the "text" of an html page? (Webbrowser - Delphi)

More articles: