I am working on a Google Cloud Storage.NET client library . There are three functions (between .NET, my client library and storage service) that combine in a nasty way:
When downloading files (objects in Google Cloud Storage terminology), the server includes a hash of the stored data. my client code then checks this hash against the data it downloaded.
A separate feature of Googleโs cloud storage is that the user can set the Content-Encoding of the object and include it as a header at startup when the request contains an Accept-Encoding match. (For now, let it ignore the behavior when the request does not include this ...)
HttpClientHandler can unzip the contents of gzip (or deflate) automatically and transparently.
When all three of them are combined, we have problems. Here's a short but complete program demonstrating this, but without using my client library (and access to a public file):
using System; using System.Linq; using System.Net; using System.Net.Http; using System.Security.Cryptography; using System.Text; using System.Threading.Tasks; class Program { static async Task Main() { string url = "https://www.googleapis.com/download/storage/v1/b/" + "storage-library-test-bucket/o/gzipped-text.txt?alt=media"; var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip }; var client = new HttpClient(handler); var response = await client.GetAsync(url); byte[] content = await response.Content.ReadAsByteArrayAsync(); string text = Encoding.UTF8.GetString(content); Console.WriteLine($"Content: {text}"); var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault(); Console.WriteLine($"Hash header: {hashHeader}"); using (var md5 = MD5.Create()) { var md5Hash = md5.ComputeHash(content); var md5HashBase64 = Convert.ToBase64String(md5Hash); Console.WriteLine($"MD5 of content: {md5HashBase64}"); } } }
.NET Core Project File:
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>netcoreapp2.0</TargetFramework> <LangVersion>7.1</LangVersion> </PropertyGroup> </Project>
Output:
Content: hello world Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA== MD5 of content: XrY7u+Ae7tCTyyK7j1rNww==
As you can see, the MD5 content does not match the MD5 part of the X-Goog-Hash header. (In my client library, I use the crc32c hash, but this shows the same behavior.)
This is not a bug in the HttpClientHandler - it was expected, but a pain when I want to check the hash. Basically, I need before and after decompression. And I can not find any way from this.
To clarify my requirements a bit, I know how to prevent decompression in HttpClient and instead unpack it after reading from the stream, but I need to do this without changing the code that uses the resulting HttpResponseMessage from HttpClient . (There is a lot of code that processes the answers, and I only want to make changes in one central place.)
I have a plan that I prototyped and which works, as far as I know, found so far, but a little ugly. This involves creating a three-layer handler:
HttpClientHandler with automatic decompression disabled.- A new handler that replaces the content stream with a new subclass of
Stream which delegates the original content stream, but hashes the data as it reads. - Decompression processor based on Microsoft
DecompressionHandler code.
While this works, it has disadvantages:
- Open Source Licensing: Checking What I Need To Do To Create A New File In My Repo Based On MIT License Microsoft Code
- Efficiently formatting MS code, which means that I should probably check regularly for any errors.
- Microsoft code uses internal assembly elements, so it doesnโt carry over as cleanly as possible.
If Microsoft made DecompressionHandler publicly available, this will help the lot - but it will probably be in a longer time frame than I need.
I am looking for an alternative approach, if possible - what I missed allows me to get to the content before decompression. I do not want to invent HttpClient - the answer is often looped, for example, and I do not want this side of things. This is a pretty specific interception point that I'm looking for.