Javascript - regex optimization / replace

I have a script that allows you to replace unwanted HTML tags and escape quotes with "enhanced" security and prevent basically the script tag and loading onload etc ... This script is used for "texturing", the content is obtained from innerHTML .

However, it is multiplied by 3 of my runtime (in a loop). I would like to know if there is a better way or a better regex for this:

 function safe_content( text ) { text = text.replace( /<script[^>]*>.*?<\/script>/gi, '' ); text = text.replace( /(<p[^>]*>|<\/p>)/g, '' ); text = text.replace( /'/g, '&#8217;' ).replace( /&#039;/g, '&#8217;' ).replace( /[\u2019]/g, '&#8217;' ); text = text.replace( /"/g, '&#8221;' ).replace( /&#034;/g, '&#8221;' ).replace( /&quot;/g, '&#8221;' ).replace( /[\u201D]/g, '&#8221;' ); text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

EDIT: here's the fiddle: https://fiddle.jshell.net/srnoe3s4/1/ . Fiddle does not like script tags in javascript string, apparently, so I did not add it.

+5
source share
1 answer

I will simply review performance and naive security checks, since writing a disinfectant is not something you can do on the corner of the table. If you want to save time, do not call replace() several times if you replace the same value, which leads to the following:

 function safe_content( text ) { text = text.replace( /<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi, '' ); text = text.replace( /'|&#039;|[\u2019]/g, '&#8217;'); text = text.replace( /"|&#034;|&quot;|[\u201D]/g, '&#8221;' ) text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

If you take into account dan1111's comment about a strange line input that violates this implementation, you can add while(/foo/.test(input)) to avoid the problem:

 function safe_content( text ) { while(/<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi.test(text)) text = text.replace( /<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi, '' ); while(/'|&#039;|[\u2019]/g.test(text)) text = text.replace( /'|&#039;|[\u2019]/g, '&#8217;'); while(/"|&#034;|&quot;|[\u201D]/g.test(text)) text = text.replace( /"|&#034;|&quot;|[\u201D]/g, '&#8221;' ) while(/([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g.test(text)) text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

in standard tests this will not be much slower than the previous code. But if input is entered into the scope of the dan1111 comment, it may be slower. See perf demo

-1
source

Source: https://habr.com/ru/post/1267010/


All Articles