Why does Powershell concatenation convert UTF8 to UTF16?

Question

Why does Powershell concatenation convert UTF8 to UTF16?

I am running the following Powershell script to combine a number of output files into a single CSV file. whidataXX.htm (where xx is a two-digit serial number), and the number of files created varies from run to run.

 $metadataPath = "\\ServerPath\foo" function concatenateMetadata { $cFile = $metadataPath + "whiconcat.csv" Clear-Content $cFile $metadataFiles = gci $metadataPath $iterations = $metadataFiles.Count for ($i=0;$i -le $iterations-1;$i++) { $iFile = "whidata"+$i+".htm" $FileExists = (Test-Path $metadataPath$iFile -PathType Leaf) if (!($FileExists)) { break } elseif ($FileExists) { Write-Host "Adding " $metadataPath$iFile Get-Content $metadataPath$iFile | Out-File $cFile -append Write-Host "to" $cfile } } }

The whidataXX.htm files are encoded in UTF8, but my output file is encoded in UTF16. When I view the file in Notepad, it looks correct, but when I view it in the Hex editor, the hexadecimal value 00 appears between each character, and when I pull the file into the Java program for processing, the file prints a console with extra spaces between characters .

Firstly, is this normal for PowerShell? or is there something in the source files that can cause this?

Secondly, how can I fix this encoding problem in the code noted above?

+6

powershell data-conversion utf-8 utf-16

dwwilson66 Oct 15 '13 at 18:22

source share

2 answers

Firstly, the fact that you get 2 bytes per character indicates that a fixed length of UTF16 is used. More precisely, this is called UCS-2. This article explains that file redirection in Powershell causes output in UCS-2. See http://www.kongsli.net/nblog/2012/04/20/powershell-gotchas-redirect-to-file-encodes-in-unicode/ . The same article also contains a correction.

+2

Tarik Oct 15 '13 at 18:41

source share

mjolinor · Accepted Answer · 2013-10-15T18:29:15+0000

Out- * commands (for example, Out-File) format data, and the default format is unicode.

You can add the -Encoding parameter to the Out-file:

 Get-Content $metadataPath$iFile | Out-File $cFile -Encoding UTF8 -append

or switch to Add-Content, which is not reformatted

 Get-Content $metadataPath$iFile | Add-Content $cFile

Why does Powershell concatenation convert UTF8 to UTF16?

More articles: