Using the DotNetZip library to unzip files without ASCII characters

Question

Using the DotNetZip library to unzip files without ASCII characters

I am trying to unzip a file using the DotNetZip Library.

The file contains folders and files with Danish characters (æøåÆØÅ).

TotalCommander, 7Zip, Windows zip all extract files correctly, but the DotNetZip Library manages Danish characters.

Example: File_æøåÆØÅ.txt becomes File_æ¢åÆ¥Å.txt

insted of aø contains a ¢ . insted a Ø contains a ¥ .

code:

 using (var zipFile = ZipFile.Read(@"File_æøåÆØÅ.zip")) { zipFile.ExtractAll(@"File_æøåÆØÅ", ExtractExistingFileAction.OverwriteSilently); }

I use the default encoding (culture "da-DK"), I tried other encodings like UTF8 etc.

How can I unzip a file containing Danish character file names?

+4

c # zip

Morten lyhr Jan 11 '11 at 6:33

source share

5 answers

To process this zip file, I explicitly specify the Danish code page when reading the zip:

 var encoding = System.Text.Encoding.GetEncoding("da-DK"); using (var zipFile = ZipFile.Read(@"File_æøåÆØÅ.zip", encoding)) { zipFile.ExtractAll(@"File_æøåÆØÅ", ExtractExistingFileAction.OverwriteSilently); }

The reason you need to do this explicitly:
The zip specification allows you to use two text encodings for file names and comments in a zip file: IBM437 and UTF8. When one of these encoded encodings is used, the zipfile metadata explicitly indicates it. DotNetZip or any library can safely use the encoding specified in the zip file.

There is no way in a zip file to specify an encoding that is not one of the two. The zip specification does not provide a way to do this. Some zip libraries or tools create zip files that do not meet specifications in this regard; zip files use text encodings such as da-DK or CP950 or something else. Strictly speaking, they do not meet the specifications, but the tools still create them. Mail files like this are not uncommon.

In such cases, some libraries or tools assume that the encoding used in the zip file is the same as the default encoding on the machine. This is unsafe or guaranteed to work, but this assumption works in the small case - where the zip file was created by an unassembled library or tool on the local computer. If you create a text encoded zip file by default (incompatible) and then send it from Stockholm to Shanghai, you cannot use the “assume standard encoding” strategy while reading.

DotNetZip does not make this assumption. In cases where the zipfile uses an incompatible text encoding, the zip file does not contain any information about which encoding is used, therefore DotNetZip uses the standard encoding - IBM437 - to read the file. DNZ has no way of knowing that this is "wrong." If you want to override this behavior, you need to use the ZipFile.Read () method, which takes a different encoding.

All of this is described in the DotNetZip documentation , in particular in ZipFile.ProvisionalAlternateEncoding .

+3

Cheeso Jan 11 '11 at 13:36

source share

I used the filet for reading, and as far as I remember, it worked (DotNetZip-v1.9). Code to read:

 using (FileStream fs = File.OpenRead(filePath)) { ZipFile zf = ZipFile.Read(fs); ICollection<ZipEntry> entries = zf.Entries; foreach (ZipEntry entry in entries) { string path = entry.FileName; // } }

And to create a zip archive: ZipFile zip = new ZipFile(Encoding.UTF8);

+2

Robertas Jan 11 '11 at 6:49

source share

First override the default DotNetZip encoding using

 zip.AlternateEncodingUsage = ZipOption.Always;

dangerous because it always overrides the zip encoding, even though it actually uses zip. I myself used

 zip.AlternateEncoding = System.Text.Encoding.UTF8; zip.AlternateEncodingUsage = ZipOption.AsNecessary;

Therefore use utf-8 if necessary.

But related to the discussion of the code page - I fixed it in dotnetzip itself (the local copy that I have) and changed the default code page from "ibm437" to "ibm861".

I used the zip tool and windows zipping to create a special zip type and used the special "ø" character in the file name. Based on the default test results for Windows and 7-zip, “ibm861” is used, not “ibm437," as indicated in most documentation.

The fix can be applied by searching for the string "ibm437" and replacing it with "ibm861" in dotnetzip itself.

Here is where I found some mention of the encoding page: http://www.nudoq.org/#!/Packages/DotNetZip/Ionic.Zip/ZipInputStream/P/ProvisionalAlternateEncoding

+1

TarmoPikaro Aug 26 '15 at 7:20

source share

I had a problem with unpacking. In the zip file, my application should read: I have special eastern European characters, such as šđčćž. WinRAR or 7Zip unpacked it, but with the DNZ library (IonicZip 1.9.1.8), instead of š, I got μ.

I tried to use 15 different applications when I finally found out that this zip file is ibm852. Now this sample code worked for me:

 ZipFile zf = new ZipFile(path, System.Text.Encoding.GetEncoding("ibm852")); zf.ExtractAll(loc, ExtractExistingFileAction.OverwriteSilently);

Setting the AlternateEncoding property, as in the following snippet, did not help me:

 using (ZipFile zz = ZipFile.Read(path)) { zz.AlternateEncodingUsage = ZipOption.Always; zz.AlternateEncoding = System.Text.Encoding.GetEncoding("ibm852"); zz.ExtractAll(loc, ExtractExistingFileAction.OverwriteSilently); }

I do not have time to investigate why, perhaps, you should set the encoding when calling the constructor, because I did not find the encoding parameter in the read method.

0

davor Dec 6 '15 at 20:43

source share

Marc gravell · Accepted Answer · 2011-01-11T06:41:34+0000

It just sounds like a bug in "DotNetZip" - have you tried SharpZipLib or ZipPackage (in BCL)? Encoding usually refers to the contents of the file, not the name; therefore, this should not be a factor.

You should report this (with an example) to the author.

Using the DotNetZip library to unzip files without ASCII characters

More articles: