Batch File Encoding

I would like to deal with a file name containing strange characters like French é.

Everything works fine in the shell:

C:\somedir\>ren -hélice hélice 

I know if I put this line in a .bat file, I get the following result:

 C:\somedir\>ren -hÚlice hÚlice 

See? é have been replaced by Ú.

The same is true for command output. If I dir some directory in the shell, the output will be fine. If I redirect this output to a file, some characters are converted.

So, how can I tell cmd.exe how to interpret what appears as é in my batch file is really é, not Ú or a comma?

So when running the .bat file there is no way to give a hint about the code page in which it was written?

+43
windows cmd encoding batch-file
Sep 15 '09 at 15:11
source share
5 answers

You must save the OEM-encoded batch file. How to do this depends on your text editor. The encoding used in this case also changes. For western cultures, this is usually the CP850.

Batch files and encoding are really two things that are not particularly similar to each other. You will notice that Unicode is also impossible to use there, unfortunately (although environment variables handle this normally).

Alternatively, you can configure the console to use a different code page:

 chcp 1252 

gotta do the trick. At least it worked for me here.

When you redirect output, for example using dir , the same rules apply. The code page of the console window is used. You can use the /u switch for cmd.exe to force redirection of Unicode output, which results in the resulting files being in UTF-16.

Regarding the encodings and code pages in cmd.exe as a whole, also see this question:

  • What is the encoding / codepage - cmd.exe using

EDIT: Regarding your editing: No, cmd always assumes that the batch file should be written to the default console code page. However, you can easily enable chcp at the beginning of the package:

 chcp 1252>NUL ren -hélice hélice 

To make this more reliable when used directly from the command line, you may want to remember the old code page and subsequently restore it:

 @echo off for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x chcp 1252>nul ren -hélice hélice chcp %cp%>nul 
+58
Sep 15 '09 at 15:15
source share

I created the following block, which I put at the beginning of my batch files:

 set Filename=%0 IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END rem Converting code page from 1252 to 850. rem My editors use 1252, my batch uses 850. rem We create a converted -850.bat file, and then launch it. set File850=%~n0-850.bat PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%" call %File850% del %File850% EXIT /b 0 :CONVERT_CODEPAGE_END 
+1
Sep 30 '13 at 14:26
source share

I had problems with this, and here is the solution I found. Find the decimal for the character you are looking for on the current code page.

For example, I'm in code 437 ( chcp tells you), and I need a degree sign. http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.

Then you will find the Unicode character with the same number.

The Unicode character in 248 (U + 00F8) is.

If you enter a Unicode character in a batch script, it will display the console as the desired character.

So my batch file

 echo 

prints

 ° 
+1
Jun 24 '14 at 13:53 on
source share

I had varnish characters inside the code in R (for example, ą, ę, ź, ż, etc.) and there was a problem when I ran this R script with a .bat file (in the output .Rout file instead of these characters were like %, &, #, etc., and the code did not work until the end).

My decision:

  • Save R script with encoding: File> Save with encoding> CP1250
  • Run the .bat file

This worked for me, but if there is a problem, try using different encodings.

0
Oct 18 '17 at 8:57
source share

I like three concepts:

  • Output Console Coding

  • Internal command line encoding (which was changed using chcp)

  • .bat Text Encoding

The simplest scenario for me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in the same encoding (in Notepad ++, menu Encoding → Character sets → West European → OEM 850).

But suppose someone hands me a .bat in a different encoding, say CP1252 (in Notepad ++, the Encoding menu * → Character sets → Western European → Windows-1252)

Then I would change the internal encoding of the command line using chcp 1252.

This changes the encoding that he uses to talk to other processes, neither the input device nor the output console.

Thus, my command line instance will effectively send characters in 1252 via the STDOUT file descriptor, but gabbed text appears when the console decodes them as 850 (é is Ú).

Then I modify the file as follows:

 @echo off perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));" ren -hlice hlice 

First, I turn on the echo, so no commands are output unless either the echo is explicitly executed ... or perl -e "print ..."

Then I put this template every time I need to output something

perl -e "use Encode qw / encode decode /;" -e "print encode ('cp850', decode ('cp1252', \" ren -hélice hélice \ n \ "));"

I will replace the actual text that I will show for this: ren -hélice hélice.

And also I would need to replace my console encoding for cp850 and another side encoding for cp1252.

And a little lower I will put the desired team.

I broke the problematic line into half the output and the actual half of the command.

  • The first thing I do for sure: "é" is interpreted as "é" by transcoding. This is necessary for all output sentences, since the console and the file are in different encodings.

  • The second, real command (skipped from @echo off), knowing that we have the same encoding from both chcp and .bat text is enough to ensure the correct interpretation of characters.

-one
Nov 25 '14 at 17:36
source share



All Articles