Just in case, if you want to use the regular expression in .NET for marking up HTML tags, it looks like it works very well in the source code for this page. This is better than some of the other answers on this page because it searches for actual HTML tags instead of blindly deleting everything between <
and >
. On BBS days, we typed <grin>
lot instead :)
, so removing <grin>
not an option. :)
This solution only removes tags. It does not delete the contents of these tags in situations where this may be important - a script tag, for example. You will see the script, but the script will not be executed because the script tag itself will be deleted. Removing the contents of an HTML tag is very difficult, and it is practically required that the HTML fragment be well formed ...
Also pay attention to the RegexOption.Singleline
parameter. This is very important for any HTML block. since there is nothing wrong with opening an HTML tag on one line and closing it in another.
string strRegex = @"</{0,1}(!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|main|map|mark|menu|menuitem|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr){1}(\s*/{0,1}>|\s+.*?/{0,1}>)"; Regex myRegex = new Regex(strRegex, RegexOptions.Singleline); string strTargetString = @"<p>Hello, World</p>"; string strReplace = @""; return myRegex.Replace(strTargetString, strReplace);
I am not saying that this is the best answer. This is just an option and it is perfect for me.
source share