How is a web browser search performed?

I want to implement in a desktop application in java search and highlight several phrases in html files , for example, in web browsers, so html tags (within <and >) are ignored , but some tags such as <b>arent are ignored. When searching for an example each table, it ...each <b>table</b> has name...will be highlighted in the text , but ...has each</p><p> Table is...it will not be highlighted in the text , because the tag <p>interrupts the text value.
it is somehow implemented in a web browser, how can I get to this implementation? or is there some kind of source on the net? I tried Google, but to no avail :(

+3
source share
4 answers

Instead of searching inside the actual HTML file, search for browsers on the rendered output of that HTML.

Get the appropriate HTML renderer and get its output as text. Then search on that text output using the appropriate string search algorithms.

The example you highlighted in your question will result in a newline character in the displayed HTML output, and therefore the normal string search algorithm will behave as you expect.

+2
source

According to Faisal, browsers only search in the provided content. To do this, you need to remove the HTML tags before doing the actual search:

This code can help you: http://www.dotnetperls.com/remove-html-tags

, /, script , .

+1

It seems pretty simple.

1) Find the last word in the line. 2) Look at the last word. 3) Determine what the last word is and breaks ( <p>, <br />, <div>). 4) If interruption, continue 5) Elseevaluate the previous word for a search query.

I do not know if the browser performs this operation this way, but this approach should work.

0
source

Try using javax.swing.text.html package in java.

0
source

Source: https://habr.com/ru/post/1764679/


All Articles