What you really want to do is decode the string first and then encode it again. Do not try to fix the encoded string.
Any encoding costs only salt, if it can be easily decoded, so repeat this logic to make your life easier. And your software is less error prone.
Now, if you are not sure whether the string is encoded or not, the problem is most likely not the string itself, but the ecosystem that created the string. Where did you get it? Who passed this before he got to you? Do you trust him?
If you really need to resort to creating the magic-fix-weird-data function, then consider creating the "encodings" table and their corresponding characters:
& -> & € -> € < -> <
Then first decode all the found encodings in accordance with the table, and then transcode the entire string. Of course, you can get more efficient methods when you first understand without decoding. But next year you will not be right. And this is your carrier, right? You need to stay in your head! You will lose your mind if you try to be too smart. And you lose your job when you go crazy. Sad things happen to people who allow them to maintain their hacks, destroy their minds ...
EDIT: Using the .NET library will surely save you from insanity:
I just tested it and it seems like it has no problem decrypting strings using only ampersands. So go on:
string magic(string encodedOrNot) { var decoded = HttpUtility.HtmlDecode(encodedOrNot); return HttpUtility.HtmlEncode(decoded); }
EDIT # 2 . It turns out that the HttpUtility.HtmlDecode
decoder will work for your purpose, but the encoder will not, because you do not need angle brackets ( <
, >
) to encode. But writing an encoder is very simple:
define encoder(string decoded): result is a string-builder for character in decoded: if character in encoding-table: result.append(encoding-table[character]) else: result.append(character) return result as string