How do I handle different requests that map to the same answer?

I am designing a web service. The request is idempotent, so I chose the GET method. The answer is relatively expensive to calculate and not small, so I want to get caching (at the protocol level) to the right. (Do not worry about the memoirs on my part, I have already considered this, my question here also draws attention to the Internet as a whole.)

In the absence of only one required parameter and the number of optional parameters with default values. For example, the following two cards correspond to the same representation of the answer. (If this is a dumb way for this interface, suggest something better.)

 GET /service?mandatory_parameter=some_data HTTP/1.1 GET /service?mandatory_parameter=some_data;optional_parameter=default1;another_optional_parameter=default2;yet_another_optional_parameter=default3 HTTP/1.1 

However, I think the clients do not know this and will relate to them separately and, therefore, the storage cache. What to do in order not to break the golden caching rule ?

  • Make a canonical form, write it down (for example, all parameters are necessary for everyone and must be sorted in a certain order) and return a client error if the required form is not required?
  • Redirect instead of error to canonical request form?
  • Or is it enough not to mind what the request looks like and just respond with the same ETag to the same answers?
+4
source share
4 answers

First, do not use semicolons as a delimiter in the query string. Should you use ? to start the query string and & to delimit pairs of variables / values. RFC 3986 does not explicitly say that you need to use & , but the vast majority of existing code uses this delimiter due to the use application/x-www-form-urlencoded use case.

Secondly, you are right in that the parameters in the query string lead to a different URI and thus relate to the caches of another resource. Assuming that you need optimal caching performance, if you know that an additional parameter has been specified, and its inclusion is not necessary and does not affect the representation that will be passed, you should redirect to the canonical representation that omits the parameter. (i.e. an optional parameter is set with a value that is set to the default value. For example, if you have http://example.com:80/ , you can normalize to http://example.com/ , since 80 is the default value for the port with HTTP. You can do the same for the request parameters, since you manage the URI space.) If you have parameters included (optional or others) that are displayed in a different order than the canonical one, you also need to redirect it. A 301 redirect would be preferable if you know that the connection between the URIs will be stable. Otherwise, redirect 302/307 if necessary. I would recommend defining your canonical form in the same way as OAuth : Sorts each parameter in alphabetical order, first with a key, then with a value. Other normalization operations will also help here. RFC 3986 has a complete section on URI normalization that will be relevant to you. This method will actually work only for GET, and redirects to PUT / POST / DELETE are usually not recommended.

Third, ETags are great, and they provide significant performance improvements if they are well implemented by both the client and server. However, unfortunately, for both sides this is rarely done correctly. Same thing for Last-Modified. You should chase them, because saving CPU and bandwidth is important when it is running, but they are not enough on their own. Other headers, such as Cache-Control, are also often needed. Section 13 of RFC 2616 is worth a look if you plan to talk about it in detail.

Finally, a word of warning - there is a problem with these redirects that you need to be aware of: clients trying to access your resources can often be redirected to other places. This creates overhead, which gives you overall savings if customers make subsequent requests against the same resource, maintaining their state in order to avoid subsequent redirection. If you are not using an open client implementation that takes advantage of your caching optimizations, you will never be able to take advantage of these settings.

+4
source

I would choose option (2) in your list - I would make a RESTful request, not RPC.

those. in this case, if you enter all parts of the request path parameters:

/ services / mandatory_parameter / some_data / optional_parameter / Default1 / another_optional_parameter / default2 / yet_another_optional_parameter / default3

In the case where not all optional parameters are indicated, return 301 (Permanent Redirection) to the fully qualified name of the resource with the default values ​​filled in. This will (or should) be cached by clients and web caches respectively and even if it gets to your backend, then 301 will be very cheap for you.

At this point, you have one canonical form for the URI, and caching will work as usual / as expected.

This means that each combination of parameters will be cached separately (like 301), however, this is fine, since non-canonical requests will have an independent cache policy for a full request, and customers who are concerned about an additional round of trip can fill all the parameters themselves.

Your option (3) will not work as you expect - each form will be cached independently, since they are different URIs.

It should also be noted that many downstream caches / software will not cache your response at all due to request parameters, so I suggest turning it into a “correct” resource ..

+1
source

At first, this is a good thing that you chose GET, as other methods do not have such good cache support. As far as I know, browsers cache URIs in relation to parameters, so I don't think it's a good idea to use a canonical form.
One thing you are not saying here is how this service will be used. If these requests are made from the browser (and it seems to me that they are probably released from the script request), they will probably look the same, even if they are asked more than once. Therefore, make sure that everything that the URI generates ends with the same URI for equal input (remove the default options or always enable them).
When it comes to ETag, I recommend that you have this, although I would like to clarify how this works; You receive a request, process all your "expensive calculations", and then if you can get an If-None-Match header with the same hash (ETag) as a processed response, you can return 304 Not-Modified. Therefore, an ETag is used to avoid transmitting a response if the client already has one. (Of course, you can implement server-side caching, but this is best done based on input parameters).
To further improve client-side caching, you might want to set the correct caching headers in response.

0
source

I asked almost the same question for me a month ago. I will describe my answer on the example of my implementation.

On the server side, I have a WFC service that receives requests in one of the following forms

 GET /Service/RequestedData?param1=data1&param2=data2… GET /Service/RequestedData/IdOfData?param1=data1&param2=data2… PUT /Service/RequestedData/IdOfData // with param1=data1&param2=data2… in body POST /Service/RequestedData/IdOfData // with param1=data1&param2=data2… in body DELETE /Service/RequestedData/IdOfData 

So the requests are in REST, but GET have some optional parameters. Especially this part is the port that interests you.

Because WFC supports URL patterns, a prototype of functions that respond to a client’s request looks like

 [WebGet (UriTemplate = "RequestedData?param1={myParam1}&param2={myParam2}", ResponseFormat = WebMessageFormat.Json)] [OperationContract] MyResult GetData (string myParam1, int myParam2); 

All requests, for example

 GET /Service/RequestedData?param1=&param2=data2 GET /Service/RequestedData?param2=data2&param1= GET /Service/RequestedData?param2=data2 

will be mapped to the same call from my WCF service. So I have one less problem.

Now, at the beginning of the implementation of each method that responds to an HTTP GET , I set "Cache-Control: max-age = 0" in the HTTP header. This means that the client is always trying to check the client’s browser cache, and ajax requests will not easily respond from the local cache, as Internet Explorer can do.

Then I always calculate ETag based on my data. The exact algorithm is the subject of a separate discussion, but the important thing is that in all responses to HTTP GET requests there are ETag in the HTTP header.

Thus, clients check their local cache each time and send a GET request to the server. They send ETag that come from their local cache inside the < If-None-Match "HTTP header. The server calculates an ETag that has data that will be sent back to this GET request. It ETag the same ETag data on the request server the client sends back a response with an empty body and the code “ 304 Not Modified ” back, in which case the browser provides data from the local cache.

If for the unknown reason the same client creates a new version of the URL request that will be interpreted from the web browser as the new URL, then the web browser will not find the answer of the old server in the local cache and send the same request to the server one more time. Is this a real problem? The server sends the data again. If you have server-side caching, you can do a bit more optimization. In most cases, the GET request URL will be generated using client-side JavaScript, so you will not have this situation.

Calculation of ETag and setting the header " Cache-Control: max-age=0 " and ETag , as well as setting the code " 304 Not Modified " should perform the WFC service, but it is very simple.

Most importantly, my ETag calculation ETag not as expansive as getting all the data from the database server and computing the MD5 cache. I use the persistent rowversion data rowversion in every row of data in a SQL Server database. This rowversion not something else as the counter of changes in a database. If you change the rowversion data rowversion in the corresponding row, it will be enlarged. Therefore, if you make a SELECT from the maximum rowversion value, and this value does not change compared to previous queries, you can be sure that the data has not been changed over a period of time. The calculation algorithm ETa g should be sensitive only to deleting data from the table. But this is also a resolved issue. You can read a little more about this in the Concurrency Sql transaction processing .

I don’t want to offer my ETag calculation as the best choice, I just want to say that calculating ETag can be much cheaper than calculating MD5 from all the data.

In case of errors, the server throws an exception that will be mapped to the HTTP code that I define in the throw statement. How the WFC body sends a standard JSON object {"description":"My error text"} . A custom error object is also possible (see Is WebProtocolException Included in .net 4.0? ). On the client side, I use jQuery and in the corresponding jQuery.ajax inside the error event handler, the error message will be decoded and displayed to the user.

So my recommendation is: using ETag along with " Cache-Control: max-age=0 " for all HTTP GET requests. For all other requests, I recommend that you implement the RESTfull service. To implement the error, you should look at the most common method that is supported by the software used to implement the server and client, and use it.

UPDATED : To clear the URL structure, I have to add the following. In my service, the bulk of the type GET /Service/RequestedData/IdOfData describes the requested data objects. Parameters param1=data1¶m2=data2 correspond mainly to information about sorting, swapping and filtering data. I use the active jqGrid plugin for jQuery, and if the end user scrolls to the next page in the grid, click on the column heading (data sorting) or, if he sets a filter in relation to the search function, all this follows other optional parameters added by the main URL address.

0
source

Source: https://habr.com/ru/post/1308471/


All Articles