Nginx, but it immediately expires / is redefined with `Cache-Control: public, s-maxage = 0`

I would like to use HTTP proxies (e.g. nginx) to cache large / expensive requests. These resources are identical for any authorized user, but their authentication / authorization must be verified by a backend for each request.

It seems like something like Cache-Control: public, max-age=0 along with the nginx directive proxy_cache_revalidate on; is a way to do it. The proxy server can cache the request, but each subsequent request must perform a conditional GET on the backend to ensure its authorization before returning the cached resource. The backend then sends 403 if the user is unauthorized, 304 if the user is authorized and the cached resource is not outdated, or 200 with a new resource if it expired.

In nginx, if max-age=0 , the request is not cached at all. If max-age=1 set, then if I wait 1 second after the initial request, then nginx executes a conditional GET request, however up to 1 second it serves it directly from the cache, which is obviously very bad for a resource that needs to be authenticated .

Is there a way to force nginx to cache the request, but immediately require a re-check?

Note that this works correctly in Apache. Here are examples for both nginx and Apache, the first 2 with max-age = 5, the last 2 with max-age = 0:

 # Apache with `Cache-Control: public, max-age=5` $ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cache: MISS from 172.xxx < X-Cache: HIT from 172.xxx < X-Cache: HIT from 172.xxx < X-Cache: HIT from 172.xxx < X-Cache: HIT from 172.xxx < X-Cache: REVALIDATE from 172.xxx < X-Cache: HIT from 172.xxx # nginx with `Cache-Control: public, max-age=5` $ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cached: MISS < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: REVALIDATED < X-Cached: HIT < X-Cached: HIT # Apache with `Cache-Control: public, max-age=0` # THIS IS WHAT I WANT $ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cache: MISS from 172.xxx < X-Cache: REVALIDATE from 172.xxx < X-Cache: REVALIDATE from 172.xxx < X-Cache: REVALIDATE from 172.xxx < X-Cache: REVALIDATE from 172.xxx < X-Cache: REVALIDATE from 172.xxx # nginx with `Cache-Control: public, max-age=0` $ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS 

As you can see in the first two examples, requests can be cached by both Apache and nginx, and Apache correctly caches even requests max-age = 0, but nginx does not.

+5
source share
3 answers

I would like to touch upon additional questions / problems that occurred during the conversation from the moment of my initial answer, simply using X-Accel-Redirect (and if compatibility with Apache is required, X-Sendfile , respectively).

The solution you are looking for as "optimal" (without X-Accel-Redirect ) is incorrect for several reasons:

  • All that is required is a request from a failed user to clear your cache.

    • If every other request is from an unauthorized user, you actually just don't have a cache at all.

    • Any user can make requests to the public URL of the resource so that your cache is cleared at any time.

  • If the files are actually static, then you are wasting extra memory, time, disk and vm / cache space to store more than one copy of each file.

  • If the supported content is dynamic:

    • Is it the same fixed cost to perform authentication as resource generation? Then what do you actually get by caching it when reevaluation is always required? Is the constant coefficient less than 2x? You might not have to worry about caching by simply checking the box, as improvements in the real world would be negligible.

    • Is it exponentially more expensive to generate a view than to authenticate? It seems like a good idea to cache a view and then serve it to tens of thousands of requests at peak times! But in order for this to happen successfully, you better not appear without user authentication (since even a couple can cause significant and unpredictable costs associated with the need to revise the submission).

  • What happens to the cache in various edge scenarios? What if the user is denied access if the developer does not use the appropriate code and then it is cached? What if the next administrator decides to configure settings or two, for example proxy_cache_use_stale ? Suddenly, you have unauthenticated users receiving confidential information. You leave all kinds of attack vectors poisoning the cache by unnecessarily combining the independent parts of your application.

  • I don't think it's technically correct to return Cache-Control: public, max-age=0 for a page that requires authentication. I believe the correct answer could be must-revalidate or private instead of public .

The lack of nginx due to the lack of support for immediate re-verification of w / max-age=0 is constructive (similar to the lack of support for .htaccess ). In accordance with the above paragraphs, it makes no sense to immediately require the revalidation of this resource, and this is just an approach that does not scale, especially when you have a "ridiculous" number of requests per second that must be performed using minimal resources and is unambiguous. If you need a web server developed by a β€œcommittee” with backward compatibility for every kitchen sink application and every dubious part of any RFC, nginx is simply not the right solution.

X-Accel-Redirect , on the other hand, is really a simple, reliable and de facto standard. It allows you to very carefully separate content from access control. It is dead simple. This actually ensures that your content will be cached, instead of having your cache destroyed by will or not. This is the right decision worth pursuing. Trying to avoid an "extra" request every 10 thousand portions during the drop-in time, at the cost of only "one" request, when caching is not required in the first place, and, in fact, a cache when receiving 10 thousand requests is not the right way to develop scalable architectures .

+2
source

I think that it is best to change your server with X-Accel-Redirect support.

Its functionality is enabled by default and is described in the documentation for proxy_ignore_headers :

"X-Accel-Redirect" internally redirects to the specified URI;

You will then cache the specified internal resource and automatically return it for any authenticated user.

Since the redirection must be internal , there would be no other way to access it (for example, without an internal redirection of any kind), therefore, according to your requirements, unauthorized users will not be able to access it, but it can still be cached in the same way like any other location .

0
source

If you cannot change the backend application as suggested, or if authentication is simple, such as auth basic, an alternative approach is to authenticate with Nginx.

Implementing this authentication process and determining the validity period of the cache is all you need to do and Nginx will take care of the rest according to the process flow below

Nginx process thread as pseudocode:

 If (user = unauthorised) then Nginx declines request; else if (cache = stale) then Nginx gets resource from backend; Nginx caches resource; Nginx serves resource; else Nginx gets resource from cache; Nginx serves resource; end if end if 

Con is that depending on the type of auth you have, you may need something like a Nginx Lua module to handle the logic.

EDIT

See additional discussions and information. Now, not fully knowing how the backend application works, but looking at the config example, the user anki-code provided the GitHub that you commented HERE , the configuration below will avoid the problem that occurred during authentication / authorization of the backend applications, which are not executed for previously cached resources.

I assume that the backend application returns HTTP 403 code for unverified users. I also assume that you have a Nginx Lua module, since the configuration of GitHub depends on this, although I note that the part you tested does not need this module.

Config:

 server { listen 80; listen [::]:80; server_name 127.0.0.1; location / { proxy_pass http://127.0.0.1:3000; # Metabase here } location ~ /api/card((?!/42/|/41/)/[0-9]*/)query { access_by_lua_block { -- HEAD request to a location excluded from caching to authenticate res = ngx.location.capture( "/api/card/42/query", { method = ngx.HTTP_HEAD } ) if res.status = 403 then return ngx.exit(ngx.HTTP_FORBIDDEN) else ngx.exec("@metabase") end if } } location @metabase { # cache all cards data without card 42 and card 41 (they have realtime data) if ($http_referer !~ /dash/){ #cache only cards on dashboard set $no_cache 1; } proxy_no_cache $no_cache; proxy_cache_bypass $no_cache; proxy_pass http://127.0.0.1:3000; proxy_cache_methods POST; proxy_cache_valid 8h; proxy_ignore_headers Cache-Control Expires; proxy_cache cache_all; proxy_cache_key "$request_uri|$request_body"; proxy_buffers 8 32k; proxy_buffer_size 64k; add_header X-MBCache $upstream_cache_status; } location ~ /api/card/\d+ { proxy_pass http://127.0.0.1:3000; if ($request_method ~ PUT) { # when the card was edited reset the cache for this card access_by_lua 'os.execute("find /var/cache/nginx -type f -exec grep -q \\"".. ngx.var.request_uri .."/\\" {} \\\; -delete ")'; add_header X-MBCache REMOVED; } } } 

In doing so, I expect the test with $ curl 'http://localhost:3001/api/card/1/query' to work as follows:

First run (with required cookie)

  • Request hits location ~ /api/card((?!/42/|/41/)/[0-9]*/)query
  • In the Nginx access phase, the "HEAD" routine is issued on /api/card/42/query . This place is excluded from caching in the given configuration.
  • The backend application returns a non-403 response, etc., since the user is authenticated.
  • Then the subprocess is issued in the @metabase block with the name location, which processes the actual request and returns the contents to the user.

Second run (no cookie required)

  • Request hits location ~ /api/card((?!/42/|/41/)/[0-9]*/)query
  • In the Nginx access phase, the "HEAD" routine is issued to the backend on /api/card/42/query .
  • The backend application returns a 403 Forbidden response because the user is not authenticated
  • The user client receives a 403 denied response.

Instead of /api/card/42/query , if resource-intensive, you can create a simple card request that will simply be used to perform authorization.

It seems like an easy way around this. The backend remains unchanged, and you configure your caching data in Nginx.

0
source

Source: https://habr.com/ru/post/1261604/


All Articles