TRESTClient / TRestRequest incorrectly decodes gzip response

I tried to read the REST API that is encoded by gzip. More precisely, I tried to read the StackExchange API.

I already found a question Automatically decode GZIP in TRESTResponse? but this answer for some reason does not solve my problem.

Test setup

In XE5, I added TRestClient, TRestRequest, and TRestResponse with the following related properties. I set the ClientURL base, resource and request parameters, and I set the AcceptEncoding request to gzip, deflate , which should make it automatically decode gzipped responses.

  object RESTClient1: TRESTClient BaseURL = 'https://api.stackexchange.com/2.2' end object RESTRequest1: TRESTRequest AcceptEncoding = 'gzip, deflate' Client = RESTClient1 Params = < item Kind = pkURLSEGMENT name = 'id' Options = [poAutoCreated] Value = '511529' end item name = 'site' Value = 'stackoverflow' end> Resource = 'users/{id}' Response = RESTResponse1 end object RESTResponse1: TRESTResponse end 

This results in a URL:

https://api.stackexchange.com/2.2/users/511529?site=stackoverflow

I invoke such a request using two message boxes to show the URL and the result of the request:

 ShowMessage(RESTRequest1.GetFullRequestURL()); RESTRequest1.Execute; // Actual call ShowMessage(RESTResponse1.Content); 

If I name this url in the browser, I get the correct result, which is a json object with some of my user data.

Problem

However, in Delphi I do not get a JSON response. In fact, I get a bunch of bytes that seem like a garbled gzip response. I tried to unzip it using TIdCompressorZlib.DecompressGZipStream() , but it does not work with ZLib Error (-3) . When I check the bytes of the answer myself, I see that it starts with C # 1F # 3F # 08. This is especially strange since the gzip header should be # 1F # 8B # 08, so # 8B is converted to # 3F, which is a question mark.

It seems to me that RESTClient tried to decode the gzip stream as if it were a UTF-8 response, and replaced invalid sequences (# 8B in itself is not a valid UTF-8 character) with a question mark.

Attempts (surface)

I experimented quite a lot, for example

  • Use RESTResponse.RawBytes and try to decode it. I noticed that the bytes in this byte array are already invalid. The comments in the TRESTResponse source taught me that "RawBytes" is already decoded, so that makes sense.
  • Saved RESTResponse.RawBytes in a file and tried to unzip it using 7zip and several gzip decompressors on the Internet. Of course, they all failed, as even the gzip header is incorrect.
  • Assigned the value "gzip, deflate" to TRESTClient.AcceptEncoding, TRESTResponse.AcceptEncoding and their combination. Also tried adding it to the pre-filled Accept attribute of each of these components.
  • Switching from authentication to an unauthorized request. I had all the work in the OAuth part, but I did, although that would make the question too complicated. However, the anonymous API I used in this question has the same problem.

Unfortunately, it still does not work, and I still get a distorted answer.

Attemps (digging in VCL)

In the end, I went a little deeper and dived into TRestRequest.Execute. I will not embed all the code here, but in the end it executes the request, calling

 FClient.HTTPClient.Get(LURL, LResponseStream); 

FClient is the TRESTClient associated with the request, and LResponseStream is TMemoryStream. I added LResponseStream.SaveToFile('...') to the clock, so it will save this raw result, et voilá, it gave me a valid gz file that I could unzip to get my JSON.

Bypass error?

But then, a couple of lines down, I see this piece of code:

  if FClient.HTTPClient.Response.CharSet > '' then begin LResponseStream.Position := 0; S := FClient.HTTPClient.ReadStringAsCharset(LResponseStream, FClient.HTTPClient.Response.CharSet); LResponseStream.Free; LResponseStream := TStringStream.Create(S); end; 

In accordance with the comment above of this block, this is done because the contents of the memory stream are "NOT encoded according to the possible presence of an encoding parameter or content type", which is considered an error in Indy by the author of this VCL code.

So basically what happens here: the original response is processed as a string and converted to the “correct” encoding. FClient.HTTPClient.Response.CharSet is "UTF-8", which is really JSON encoding, but unfortunately this conversion should only be performed after decompressing a stream that has not yet been executed. So this is considered a mistake.;)

I tried to dig deeper, but I could not find the place where this decompression was supposed to take place. The actual request is made by the IIPHTTP instance, which is IPPeerAPI.dcu, which I have no source for.

So...

So my question is twofold:

  • Why is this happening? TRESTClient should automatically decode the gzip stream when you set AcceptEncoding to "gzip, deflate". What setting did I skip? Or is it not yet supported in XE5?
  • How to prevent incorrect gzip stream translation? I don’t mind decoding the answer myself while it works, although ideally REST components should do this automatically.

My setup: VCL Forms application, Windows 8.1, Delphi XE5 2 professional update.

Update

  • Workflow detected (see my answer)
  • RSP-9855 Error Report Filed as Central
  • It is supposedly fixed in Delphi 10.1 (Berlin), but I have yet to verify this.
+5
source share
2 answers

Remy Lebo contributes his answer to this question, as well as his comment on the answer in the question Automatically decode GZIP in TRESTResponse? put me on the right track.

As he said, the AcceptEncoding installation is not enough, because TIdHTTP, which executes the actual request, does not have a decompressor connected, so it cannot unzip the gzip response. Based on sparse resources, I realized that setting AcceptEncoding would automatically decompress the response, but this idea was wrong.

However, leaving AcceptEncoding empty will not work in this case, since the API, which is the StackExchange API, is always compressed , regardless of whether you accept gzip or not.

Thus, a combination of a) an always compressed response, b) an HTTP client that cannot decompress, and c) a TRESTRequest object, which, correctly, assumed that the answer is already correctly decompressed together, leads to this situation.

I see only two solutions, the first of which is to completely remove TRESTClient and simply execute the request using simple TIdHTTP. It’s a pity, since my goal was to explore the possibilities of the new REST components to see how they can make life easier.

So another solution is to designate a TIdHTTP compressor that is used internally.

I managed to succeed, although, unfortunately, it cancels out the significant abstraction that TREST components are trying to implement. This is the code that solves it:

 var Http: TIdCustomHTTP; begin // Get the TIdHTTP that performs the request. Http := (RESTRequest1 // The TRESTRequest object .Client // The TRESTClient .HTTPClient // A TRESTHTTP object that wraps HTTP communication .Peer // An IIPHTTP interface which is obtained through PeerFactory.CreatePeer .GetObject // A method to get the object instance of the interface as TIdCustomHTTP // The object instance, which is an TIdCustomHTTP. ); // Attach a gzip decompressor to it. Http.Compressor := TIdCompressorZLib.Create(Http); 

After that, I can use the RESTRequest1 component to successfully receive a JSON response (at least as text).

+4
source

AcceptEncoding = 'gzip, deflate'

This is the root of your problem. You manually tell the server that the answer is allowed for gzip encoding, but as far as I can see in the REST source code, the base TIdHTTP object that TRESTClient uses internally does not have a gzip decompressor assigned to it (even if it were one, manual AcceptEncoding all equally would be wrong because TIdHTTP sets its own Accept-Encoding header if a decompressor is assigned). I commented on this in another question that you contacted. Thus, TIdHTTP completes the return of raw gzip bytes without decoding them, and then TRESTClient converts them as-is into a decoded charset UnicodeString (since you are reading the Content property). This is why you see that the bytes are confused.

You need to get rid of the AcceptEncoding .

Why is this happening?

Because TRESTClient does not assign a gzip unpacker to its internal TIdHTTP object, but you are tricking the server into thinking that it did.

should automatically decode the gzip stream when you set AcceptEncoding to "gzip, deflate"

No, because the designated decompressor is not assigned.

Update : I would most likely just reset TRESTClient and use TIdHTTP directly. The following works for me when I try:

 var HTTP: TIdHTTP; JSON: string; begin HTTP := TIdHTTP.Create; try HTTP.Compressor := TIdCompressorZLib.Create(HTTP); // starting with SVN rev 5224, the TIdHTTP.IOHandler property no longer // needs to be explicitly set in order to request HTTPS urls. TIdHTTP // now creates a default SSLIOHandler internally if needed. But if you // are using an older release, you will have to assign the IOHandler... // // HTTP.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(HTTP); // JSON := HTTP.Get('https://api.stackexchange.com/2.2/users/511529?site=stackoverflow'); finally Http.Free; end; ShowMessage(JSON); end; 

Conclusion:

 {"items":[{"badge_counts":{"bronze":96,"silver":53,"gold":4},"account_id":240984,"is_employee":false,"last_modified_date":1419235802,"last_access_date":1419293282,"reputation_change_year":15259,"reputation_change_quarter":2983,"reputation_change_month":1301,"reputation_change_week":123,"reputation_change_day":0,"reputation":61014,"creation_date":1290042241,"user_type":"registered","user_id":511529,"accept_rate":100,"location":"Netherlands","website_url":"http://www.eftepedia.nl","link":"https://stackoverflow.com/users/511529/goleztrol","display_name":"GolezTrol","profile_image":"https://www.gravatar.com/avatar/b07c67edfcc5d1496365503712de5c2a?s=128&d=identicon&r=PG"}],"has_more":false,"quota_max":300,"quota_remaining":295} 
+3
source

Source: https://habr.com/ru/post/1209715/


All Articles