Is it possible to access compressed data before decompression in HttpClient?

I am working on a Google Cloud Storage.NET client library . There are three functions (between .NET, my client library and storage service) that combine in a nasty way:

  • When downloading files (objects in Google Cloud Storage terminology), the server includes a hash of the stored data. my client code then checks this hash against the data it downloaded.

  • A separate feature of Googleโ€™s cloud storage is that the user can set the Content-Encoding of the object and include it as a header at startup when the request contains an Accept-Encoding match. (For now, let it ignore the behavior when the request does not include this ...)

  • HttpClientHandler can unzip the contents of gzip (or deflate) automatically and transparently.

When all three of them are combined, we have problems. Here's a short but complete program demonstrating this, but without using my client library (and access to a public file):

 using System; using System.Linq; using System.Net; using System.Net.Http; using System.Security.Cryptography; using System.Text; using System.Threading.Tasks; class Program { static async Task Main() { string url = "https://www.googleapis.com/download/storage/v1/b/" + "storage-library-test-bucket/o/gzipped-text.txt?alt=media"; var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip }; var client = new HttpClient(handler); var response = await client.GetAsync(url); byte[] content = await response.Content.ReadAsByteArrayAsync(); string text = Encoding.UTF8.GetString(content); Console.WriteLine($"Content: {text}"); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Console.WriteLine($"Hash header: {hashHeader}"); using (var md5 = MD5.Create()) { var md5Hash = md5.ComputeHash(content); var md5HashBase64 = Convert.ToBase64String(md5Hash); Console.WriteLine($"MD5 of content: {md5HashBase64}"); } } } 

.NET Core Project File:

 <Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>netcoreapp2.0</TargetFramework> <LangVersion>7.1</LangVersion> </PropertyGroup> </Project> 

Output:

 Content: hello world Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA== MD5 of content: XrY7u+Ae7tCTyyK7j1rNww== 

As you can see, the MD5 content does not match the MD5 part of the X-Goog-Hash header. (In my client library, I use the crc32c hash, but this shows the same behavior.)

This is not a bug in the HttpClientHandler - it was expected, but a pain when I want to check the hash. Basically, I need before and after decompression. And I can not find any way from this.

To clarify my requirements a bit, I know how to prevent decompression in HttpClient and instead unpack it after reading from the stream, but I need to do this without changing the code that uses the resulting HttpResponseMessage from HttpClient . (There is a lot of code that processes the answers, and I only want to make changes in one central place.)

I have a plan that I prototyped and which works, as far as I know, found so far, but a little ugly. This involves creating a three-layer handler:

  • HttpClientHandler with automatic decompression disabled.
  • A new handler that replaces the content stream with a new subclass of Stream which delegates the original content stream, but hashes the data as it reads.
  • Decompression processor based on Microsoft DecompressionHandler code.

While this works, it has disadvantages:

  • Open Source Licensing: Checking What I Need To Do To Create A New File In My Repo Based On MIT License Microsoft Code
  • Efficiently formatting MS code, which means that I should probably check regularly for any errors.
  • Microsoft code uses internal assembly elements, so it doesnโ€™t carry over as cleanly as possible.

If Microsoft made DecompressionHandler publicly available, this will help the lot - but it will probably be in a longer time frame than I need.

I am looking for an alternative approach, if possible - what I missed allows me to get to the content before decompression. I do not want to invent HttpClient - the answer is often looped, for example, and I do not want this side of things. This is a pretty specific interception point that I'm looking for.

+45
Nov 16 '17 at 7:53 on
source share
3 answers

Looking at the fact that @Michael gave me a hint that I was missing. After receiving the compressed content, you can use CryptoStream and GZipStream and StreamReader to read the response without loading it into memory more than necessary. CryptoStream will hash the compressed content as it is unpacked and read. Replace StreamReader with FileStream , and you can write data to a file with minimal memory usage :)

 using System; using System.IO; using System.IO.Compression; using System.Linq; using System.Net; using System.Net.Http; using System.Security.Cryptography; using System.Text; using System.Threading.Tasks; class Program { static async Task Main() { string url = "https://www.googleapis.com/download/storage/v1/b/" + "storage-library-test-bucket/o/gzipped-text.txt?alt=media"; var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.None }; var client = new HttpClient(handler); client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip"); var response = await client.GetAsync(url); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Console.WriteLine($"Hash header: {hashHeader}"); string text = null; using (var md5 = MD5.Create()) { using (var cryptoStream = new CryptoStream(await response.Content.ReadAsStreamAsync(), md5, CryptoStreamMode.Read)) { using (var gzipStream = new GZipStream(cryptoStream, CompressionMode.Decompress)) { using (var streamReader = new StreamReader(gzipStream, Encoding.UTF8)) { text = streamReader.ReadToEnd(); } } Console.WriteLine($"Content: {text}"); var md5HashBase64 = Convert.ToBase64String(md5.Hash); Console.WriteLine($"MD5 of content: {md5HashBase64}"); } } } } 

Output:

 Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA== Content: hello world MD5 of content: xhF4M6pNFRDQnvaRRNVnkA== 

V2 response

After reading John's answer and the updated answer, I have the next version. HttpContent much the same idea, but I moved the stream to the special HttpContent that I am inserting. Not quite beautiful, but there is an idea.

 using System; using System.IO; using System.IO.Compression; using System.Linq; using System.Net; using System.Net.Http; using System.Security.Cryptography; using System.Text; using System.Threading; using System.Threading.Tasks; class Program { static async Task Main() { string url = "https://www.googleapis.com/download/storage/v1/b/" + "storage-library-test-bucket/o/gzipped-text.txt?alt=media"; var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.None }; var client = new HttpClient(new Intercepter(handler)); client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip"); var response = await client.GetAsync(url); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Console.WriteLine($"Hash header: {hashHeader}"); HttpContent content1 = response.Content; byte[] content = await content1.ReadAsByteArrayAsync(); string text = Encoding.UTF8.GetString(content); Console.WriteLine($"Content: {text}"); var md5Hash = ((HashingContent)content1).Hash; var md5HashBase64 = Convert.ToBase64String(md5Hash); Console.WriteLine($"MD5 of content: {md5HashBase64}"); } public class Intercepter : DelegatingHandler { public Intercepter(HttpMessageHandler innerHandler) : base(innerHandler) { } protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) { var response = await base.SendAsync(request, cancellationToken); response.Content = new HashingContent(await response.Content.ReadAsStreamAsync()); return response; } } public sealed class HashingContent : HttpContent { private readonly StreamContent streamContent; private readonly MD5 mD5; private readonly CryptoStream cryptoStream; private readonly GZipStream gZipStream; public HashingContent(Stream content) { mD5 = MD5.Create(); cryptoStream = new CryptoStream(content, mD5, CryptoStreamMode.Read); gZipStream = new GZipStream(cryptoStream, CompressionMode.Decompress); streamContent = new StreamContent(gZipStream); } protected override Task SerializeToStreamAsync(Stream stream, TransportContext context) => streamContent.CopyToAsync(stream, context); protected override bool TryComputeLength(out long length) { length = 0; return false; } protected override Task<Stream> CreateContentReadStreamAsync() => streamContent.ReadAsStreamAsync(); protected override void Dispose(bool disposing) { try { if (disposing) { streamContent.Dispose(); gZipStream.Dispose(); cryptoStream.Dispose(); mD5.Dispose(); } } finally { base.Dispose(disposing); } } public byte[] Hash => mD5.Hash; } } 
+6
Nov 21 '17 at 18:24
source

I managed to get the header correct:

  • creating a custom handler that inherits the HttpClientHandler
  • override SendAsync
  • read as byte response with base.SendAsync
  • GZipStream Compression
  • Gzip Md5 hashing on base64 (using your code)

this problem, as you said, "before decompression" is not respected here.

The idea is to make this if work the way you would like https://github.com/dotnet/corefx/blob/master/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/ WinHttpResponseParser.cs # L80-L91

he matches

 class Program { const string url = "https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media"; static async Task Main() { //await HashResponseContent(CreateHandler(DecompressionMethods.None)); //await HashResponseContent(CreateHandler(DecompressionMethods.GZip)); await HashResponseContent(new MyHandler()); Console.ReadLine(); } private static HttpClientHandler CreateHandler(DecompressionMethods decompressionMethods) { return new HttpClientHandler { AutomaticDecompression = decompressionMethods }; } public static async Task HashResponseContent(HttpClientHandler handler) { //Console.WriteLine($"Using AutomaticDecompression : '{handler.AutomaticDecompression}'"); //Console.WriteLine($"Using SupportsAutomaticDecompression : '{handler.SupportsAutomaticDecompression}'"); //Console.WriteLine($"Using Properties : '{string.Join('\n', handler.Properties.Keys.ToArray())}'"); var client = new HttpClient(handler); var response = await client.GetAsync(url); byte[] content = await response.Content.ReadAsByteArrayAsync(); string text = Encoding.UTF8.GetString(content); Console.WriteLine($"Content: {text}"); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Console.WriteLine($"Hash header: {hashHeader}"); byteArrayToMd5(content); Console.WriteLine($"====================================================================="); } public static string byteArrayToMd5(byte[] content) { using (var md5 = MD5.Create()) { var md5Hash = md5.ComputeHash(content); return Convert.ToBase64String(md5Hash); } } public static byte[] Compress(byte[] contentToGzip) { using (MemoryStream resultStream = new MemoryStream()) { using (MemoryStream contentStreamToGzip = new MemoryStream(contentToGzip)) { using (GZipStream compressionStream = new GZipStream(resultStream, CompressionMode.Compress)) { contentStreamToGzip.CopyTo(compressionStream); } } return resultStream.ToArray(); } } } public class MyHandler : HttpClientHandler { protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) { var response = await base.SendAsync(request, cancellationToken); var responseContent = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false); Program.byteArrayToMd5(responseContent); var compressedResponse = Program.Compress(responseContent); var compressedResponseMd5 = Program.byteArrayToMd5(compressedResponse); Console.WriteLine($"recompressed response to md5 : {compressedResponseMd5}"); return response; } } 
+3
Nov 16 '17 at 16:14
source

How to disable automatic decompression, manually add the Accept-Encoding header, and then decompress after checking the hash?

 private static async Task Test2() { var url = @"https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media"; var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.None }; var client = new HttpClient(handler); client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip"); var response = await client.GetAsync(url); var raw = await response.Content.ReadAsByteArrayAsync(); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Debug.WriteLine($"Hash header: {hashHeader}"); bool match = false; using (var md5 = MD5.Create()) { var md5Hash = md5.ComputeHash(raw); var md5HashBase64 = Convert.ToBase64String(md5Hash); match = hashHeader.EndsWith(md5HashBase64); Debug.WriteLine($"MD5 of content: {md5HashBase64}"); } if (match) { var memInput = new MemoryStream(raw); var gz = new GZipStream(memInput, CompressionMode.Decompress); var memOutput = new MemoryStream(); gz.CopyTo(memOutput); var text = Encoding.UTF8.GetString(memOutput.ToArray()); Console.WriteLine($"Content: {text}"); } } 
+2
Nov 18 '17 at 10:19 on
source



All Articles