How to encode an ampersand if it is not already encoded?

I need a C # method to encode ampersands if they are not already encoded or are not part of another encoded epxression

eg

"tom & jill" should become "tom &amp; jill" "tom &amp; jill" should remain "tom &amp; jill" "tom &euro; jill" should remain "tom &euro; jill" "tom <&> jill" should become "tom <&amp;> jill" "tom &quot;&&quot; jill" should become "tom &quot;&amp;&quot; jill" 
+6
source share
3 answers

This should do a pretty good job:

 text = Regex.Replace(text, @" # Match & that is not part of an HTML entity. & # Match literal &. (?! # But only if it is NOT... \w+; # an alphanumeric entity, | \#[0-9]+; # or a decimal entity, | \#x[0-9A-F]+; # or a hexadecimal entity. ) # End negative lookahead.", "&amp;", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); 
+3
source

What you really want to do is decode the string first and then encode it again. Do not try to fix the encoded string.

Any encoding costs only salt, if it can be easily decoded, so repeat this logic to make your life easier. And your software is less error prone.

Now, if you are not sure whether the string is encoded or not, the problem is most likely not the string itself, but the ecosystem that created the string. Where did you get it? Who passed this before he got to you? Do you trust him?

If you really need to resort to creating the magic-fix-weird-data function, then consider creating the "encodings" table and their corresponding characters:

 &amp; -> & &euro; -> € &lt; -> < // etc. 

Then first decode all the found encodings in accordance with the table, and then transcode the entire string. Of course, you can get more efficient methods when you first understand without decoding. But next year you will not be right. And this is your carrier, right? You need to stay in your head! You will lose your mind if you try to be too smart. And you lose your job when you go crazy. Sad things happen to people who allow them to maintain their hacks, destroy their minds ...

EDIT: Using the .NET library will surely save you from insanity:

I just tested it and it seems like it has no problem decrypting strings using only ampersands. So go on:

 string magic(string encodedOrNot) { var decoded = HttpUtility.HtmlDecode(encodedOrNot); return HttpUtility.HtmlEncode(decoded); } 

EDIT # 2 . It turns out that the HttpUtility.HtmlDecode decoder will work for your purpose, but the encoder will not, because you do not need angle brackets ( < , > ) to encode. But writing an encoder is very simple:

 define encoder(string decoded): result is a string-builder for character in decoded: if character in encoding-table: result.append(encoding-table[character]) else: result.append(character) return result as string 
+11
source

with regex, this can be done with a negative lookahead .

 &(?![^& ]+;) 

Test example here

+1
source

Source: https://habr.com/ru/post/899024/


All Articles