How can I query the database field but ignore the HTML markup?

We have a field that contains HTML markup for formatting on a website, but we need to request only the text that should be displayed on the screen, and not such things as CSS tags, tag names, property names, etc.

Is there a way to ignore markup in an SQL query or stored procedure? If there are ways to do this, will we have performance problems later?

I assume there is some way to use angle brackets to parse searchable text fields.

+3
source share
5 answers

. , .

@Nissan: HTML IMO. , . , , (, ) unencoded < . , , HTML.

, img elements 'ALT. title s. , " , ". .

, HTML , , . DOM - , BeautifulSoup HTML - nodeValue .

, , OP - , , . , HTML, HTML , .

+4

; , .

:

  • . UDF . , ( ), .

  • . HTML - . , 95% UDF, 100%.

  • . - HTML- , # , , .

...

, , :

  • .

  • , , HTML , .

  • .

, , , , .

+3

, html:

WHERE dbo.anyRemoveHtml(yourColumn)='your search text'

the index will not be used and you are scanning the table. this may not be a problem when the application has little data, but will result in a slower and slower SELECT, since more data is added to the table.

note: dbo.anyRemoveHtml is just a compiled name representing the function that you choose to remove HTML and doesn’t actually exist

0
source

Source: https://habr.com/ru/post/1732394/


All Articles