I have an outdated program, and after starting it, it will generate a log file. Now I need to parse this log file.
But the file format is very strange. Note the following: I used vi to open it, it looks like a unicode file, but it is not FFFE. after i used notepad, open it, save it and open it again, i found that FFFE was added to the notepad. Then I can use the command "type log.txt> log1.txt" to convert the entire file to ANSI format. Later in perl, I can use / TDD / in perl to find the file I need.
But now I can not deal with this file.
Any comments or ideas would be greatly appreciated.
0000000: 5400 4400 4400 3e00 2000 4c00 6f00 6100 TDD>. .Loa
After notepad saves it
0000000: fffe 5400 4400 4400 3e00 2000 4c00 6f00 ..TDD>. .Lo open STDIN, "< log.txt"; while(<>) { if (/TDD/) {
I read a thread, which is very useful, but still cannot solve my problem. How to open a Unicode file using Perl?
I cannot add an answer, so I am editing my stream.
Thanks, Michael, I tried your script, but got the following error. I checked version of Perl 5.1, OS - Windows 2008.
* ascii * ascii-ctrl * iso-8859-1 * null * utf-8-strict * utf8 UTF-16:Unrecognised BOM 5400 at test.pl line 12.
Update
I tried UTF-16LE with the command:
perl.exe open.pl utf-16le utf-16 <my log file>.txt
but i still got an error like
UTF-16LE:Partial character at open.pl line 18, <$fh> line 1824.
too, I tried utf-16be, got the same error.
If I used utf-16, I will get an error
UTF-16:Unrecognised BOM 5400 at open.pl line 18.
open.pl line 18
is "print while <$fh>;"
Any idea?
Updated: 11/11/2011. Thanks guys for your help. I solved the problem. I found that the data in the log file is not UTF-16. So, I had to write a .net project by visual studio. He will read the log file with UTF-16 and write to a new file with UTF-8. And then I used a perl script to parse the file and create the result data. Now it worked.
So, if any of you know how to use perl, read the file with a lot of garbage data, please say many thanks.
eg. garbage sample data
tests.cpp:34) ΰ¨εδδγΈ δ°ζΌζζζ€ζΈζ ζζζ΄ζζΌηζβΈζζ°
use a hex reader to open it:
0000070: a88d e590 80e4 9080 e490 80e3 b880 e280 ................ 0000080: 80e4 b080 e6bc 80e6 8480 e690 80e6 a480 ................ 0000090: e6b8 80e6 9c80 e280 80e6 8c80 e68c 80e6 ................ 00000a0: b480 e68c 80e6 bc80 e788 80e6 9480 e2b8 ................