I tried to read the REST API that is encoded by gzip. More precisely, I tried to read the StackExchange API.
I already found a question Automatically decode GZIP in TRESTResponse? but this answer for some reason does not solve my problem.
Test setup
In XE5, I added TRestClient, TRestRequest, and TRestResponse with the following related properties. I set the ClientURL base, resource and request parameters, and I set the AcceptEncoding request to gzip, deflate , which should make it automatically decode gzipped responses.
object RESTClient1: TRESTClient BaseURL = 'https://api.stackexchange.com/2.2' end object RESTRequest1: TRESTRequest AcceptEncoding = 'gzip, deflate' Client = RESTClient1 Params = < item Kind = pkURLSEGMENT name = 'id' Options = [poAutoCreated] Value = '511529' end item name = 'site' Value = 'stackoverflow' end> Resource = 'users/{id}' Response = RESTResponse1 end object RESTResponse1: TRESTResponse end
This results in a URL:
https://api.stackexchange.com/2.2/users/511529?site=stackoverflow
I invoke such a request using two message boxes to show the URL and the result of the request:
ShowMessage(RESTRequest1.GetFullRequestURL()); RESTRequest1.Execute; // Actual call ShowMessage(RESTResponse1.Content);
If I name this url in the browser, I get the correct result, which is a json object with some of my user data.
Problem
However, in Delphi I do not get a JSON response. In fact, I get a bunch of bytes that seem like a garbled gzip response. I tried to unzip it using TIdCompressorZlib.DecompressGZipStream() , but it does not work with ZLib Error (-3) . When I check the bytes of the answer myself, I see that it starts with C # 1F # 3F # 08. This is especially strange since the gzip header should be # 1F # 8B # 08, so # 8B is converted to # 3F, which is a question mark.
It seems to me that RESTClient tried to decode the gzip stream as if it were a UTF-8 response, and replaced invalid sequences (# 8B in itself is not a valid UTF-8 character) with a question mark.
Attempts (surface)
I experimented quite a lot, for example
- Use RESTResponse.RawBytes and try to decode it. I noticed that the bytes in this byte array are already invalid. The comments in the TRESTResponse source taught me that "RawBytes" is already decoded, so that makes sense.
- Saved RESTResponse.RawBytes in a file and tried to unzip it using 7zip and several gzip decompressors on the Internet. Of course, they all failed, as even the gzip header is incorrect.
- Assigned the value "gzip, deflate" to TRESTClient.AcceptEncoding, TRESTResponse.AcceptEncoding and their combination. Also tried adding it to the pre-filled Accept attribute of each of these components.
- Switching from authentication to an unauthorized request. I had all the work in the OAuth part, but I did, although that would make the question too complicated. However, the anonymous API I used in this question has the same problem.
Unfortunately, it still does not work, and I still get a distorted answer.
Attemps (digging in VCL)
In the end, I went a little deeper and dived into TRestRequest.Execute. I will not embed all the code here, but in the end it executes the request, calling
FClient.HTTPClient.Get(LURL, LResponseStream);
FClient is the TRESTClient associated with the request, and LResponseStream is TMemoryStream. I added LResponseStream.SaveToFile('...') to the clock, so it will save this raw result, et voilá, it gave me a valid gz file that I could unzip to get my JSON.
Bypass error?
But then, a couple of lines down, I see this piece of code:
if FClient.HTTPClient.Response.CharSet > '' then begin LResponseStream.Position := 0; S := FClient.HTTPClient.ReadStringAsCharset(LResponseStream, FClient.HTTPClient.Response.CharSet); LResponseStream.Free; LResponseStream := TStringStream.Create(S); end;
In accordance with the comment above of this block, this is done because the contents of the memory stream are "NOT encoded according to the possible presence of an encoding parameter or content type", which is considered an error in Indy by the author of this VCL code.
So basically what happens here: the original response is processed as a string and converted to the “correct” encoding. FClient.HTTPClient.Response.CharSet is "UTF-8", which is really JSON encoding, but unfortunately this conversion should only be performed after decompressing a stream that has not yet been executed. So this is considered a mistake.;)
I tried to dig deeper, but I could not find the place where this decompression was supposed to take place. The actual request is made by the IIPHTTP instance, which is IPPeerAPI.dcu, which I have no source for.
So...
So my question is twofold:
- Why is this happening? TRESTClient should automatically decode the gzip stream when you set AcceptEncoding to "gzip, deflate". What setting did I skip? Or is it not yet supported in XE5?
- How to prevent incorrect gzip stream translation? I don’t mind decoding the answer myself while it works, although ideally REST components should do this automatically.
My setup: VCL Forms application, Windows 8.1, Delphi XE5 2 professional update.
Update
- Workflow detected (see my answer)
- RSP-9855 Error Report Filed as Central
- It is supposedly fixed in Delphi 10.1 (Berlin), but I have yet to verify this.