I have a gzip file containing the txt file that needs to be cleaned. I would like to read from the GZipped file line by line, and then write the cleared content to the GZIP output file in just one shot, for example:
void ExtractAndFix(string inputPath, string outputPath) { StringBuilder sbLine = new StringBuilder(); using (GZipStream gzInput = new GZipStream(new FileStream(inputPath, FileMode.Open), System.IO.Compression.CompressionMode.Decompress)) { using (StreamReader reader = new StreamReader(gzInput, Encoding.UTF8)) { using (GZipOutputStream gzipWriter = new GZipOutputStream(new FileStream(outputPath, FileMode.Create))) { string line = null; while ((line = reader.ReadLine()) != null) { sbLine.Clear(); sbLine.Append(line.Replace("\t", " ")); sbLine.Append("\r\n"); byte[] bytes = Encoding.UTF8.GetBytes(sbLine.ToString()); gzipWriter.Write(bytes, 0, bytes.Length); } } } } }
But for some reason, calling line = reader.ReadLine () in a while loop ONLY reads once and then returns null (reader EOS = true). I tried this with my own C # compression library as well as with the ICSharpCode package and I get the same behavior. I understand that I can always just extract the complete file, then clean it, and then compress it again, but I do not want to waste resources, hard disk space, etc. Note: these are large files (compression to several GB), so nothing with a MemoryStream would be a good solution. Has anyone come across something strange like this before? Thanks.
source share