GZipStream with StreamReader.ReadLine reads only the first line

I have a gzip file containing the txt file that needs to be cleaned. I would like to read from the GZipped file line by line, and then write the cleared content to the GZIP output file in just one shot, for example:

void ExtractAndFix(string inputPath, string outputPath) { StringBuilder sbLine = new StringBuilder(); using (GZipStream gzInput = new GZipStream(new FileStream(inputPath, FileMode.Open), System.IO.Compression.CompressionMode.Decompress)) { using (StreamReader reader = new StreamReader(gzInput, Encoding.UTF8)) { using (GZipOutputStream gzipWriter = new GZipOutputStream(new FileStream(outputPath, FileMode.Create))) { string line = null; while ((line = reader.ReadLine()) != null) { sbLine.Clear(); sbLine.Append(line.Replace("\t", " ")); sbLine.Append("\r\n"); byte[] bytes = Encoding.UTF8.GetBytes(sbLine.ToString()); gzipWriter.Write(bytes, 0, bytes.Length); } } } } } 

But for some reason, calling line = reader.ReadLine () in a while loop ONLY reads once and then returns null (reader EOS = true). I tried this with my own C # compression library as well as with the ICSharpCode package and I get the same behavior. I understand that I can always just extract the complete file, then clean it, and then compress it again, but I do not want to waste resources, hard disk space, etc. Note: these are large files (compression to several GB), so nothing with a MemoryStream would be a good solution. Has anyone come across something strange like this before? Thanks.

+5
source share
1 answer

After stretching my hair, I seem to have found a problem. For me, the problem was further compounded by the fact that some GZip files would work fine, while others would display the behavior above. For example, if I created the archive myself with GZip, it will work fine, but some other archives created from other sources will not.

In short, the .NET GZip library is garbage, do not use it. Also, the ICSharpCode library that I used was a couple of years old. I'm not sure if it was used for contrailing on the base .NET code or not, but the previous version (0.85.4) gave the same behavior. When I upgraded to the latest version (0.86.0), it worked as expected, and I was able to read the full file as expected.

Hope this helps someone else with the same issue

+6
source

Source: https://habr.com/ru/post/1202898/


All Articles