How to implement caching for a web application

Question

How to implement caching for a web application

What are the different ways to cache data from web applications developed using Java and NoSQL databases? Databases also provide caching, are they the only and always the best option for caching?

How else can I cache my user data in the application. The application contains very user-specific data, for example, on a social network. Are there any simple thumb rules about what types of things should be cached?

Is it possible to cache data on the application server using Java?

+4

java database caching web-applications

Rajat gupta Mar 05 '11 at 19:59

source share

2 answers

If you want a rule of thumb, here is what Michael Jackson (the wrong Michael Jackson) said:

The first rule of program optimization: Do not do this .
The second rule of program optimization (only for experts!): Do not do this yet .

An ancient tradition is that you do not optimize until you are profiled, that is, until you get convincing evidence that you really need to optimize. Cacheing is a kind of optimization; this is very important for your application, but until you can place the application under load and see which objects take a lot of time (loading from the database or something else), you won’t know what caching needs are. It doesn't really matter how smart you are or what advice you get here - until you do, you wo n’t know what to cache.

As for the things you can cache, this is nothing, but I suppose you can classify it into three groups:

Things that appeared from the database. They are easy to cache, because at the moment you go to the database, you have the identification information necessary for the cache key (primary key, query parameters, etc.). By caching them, you save the time needed to retrieve them from the database - this includes IO, so it can be quite large.
Things that were obtained by calculating in the domain model (possibly news feeds in a social application). They may be more difficult to cache, since there is more contextual information in their creation; you may need to reorganize your code to create a single point at which all the information is needed, so you can apply caching to it. Or you may find that it already exists. Caching this data will save all access to the database necessary to obtain the information that goes into their creation, as well as all the calculations; the time spent on the calculation may or may not be a significant addition to the time spent on the IO. Invalid cached items of this kind are likely to be much more complex than pure database objects.
Things that are sent to the browser are pages or page fragments. They can be cached quite easily, because in a properly designed application they are uniquely identified by either a URL or a combination of a URL and a user. Caching will save all computation in your application; it can even avoid service requests because it can be done using a reverse proxy sitting in front of your application server. Two problems. Firstly, it uses a huge amount of memory: a page obtained from several kilobytes of objects can be tens or hundreds of kilobytes in size (my Facebook homepage is 50 KB). This means that you need to save a huge amount of computation to make it more profitable than caching at the database or domain model levels, and there is simply not much computation between the domain model and HTML in a reasonably designed application. Secondly, the invalidity is even more complicated than in the domain model, and is likely to happen prohibitively often - everything that changes the page or fragment should invalidate the cache.

Finally, the real mechanism: start with something simple and incomplete, such as a card with a limited size and the least recently used eviction policy. It is simple but effective. Something outside the process, such as EHCache, is more complex, but has two advantages: you can share caches between several processes (useful if you have a cluster, which you are probably at some point), and you can store data , where the garbage collector will not see this, which can save some processor time (perhaps this is too large an object to get here).

But I will repeat my first point: do not cache until you know what to cache, and as soon as you do this, remember the limitations on the benefits of caching and try to simplify the caching strategy as much as possible (but, of course, not easier).

+21

Tom anderson Mar 05 '11 at 22:00

source share

orangepips · Accepted Answer · 2011-03-05T21:02:03+0000

I assume that you are creating a relatively typical web application that:

has one server used to save
multiple web servers
connects authenticated users to a single server through sticky sessions through a load balancer

Now that we will answer your questions. Most persistences, a database, or NoSQL probably have some kind of caching built in such a way that if you repeat the same simple query (for example, extracting by primary key), it can cache the result. However, the more complex the request, the less likely persistence can perform caching on it. In addition, if there is only one server to save (i.e. no fragments or master / read slave entries), it quickly becomes a bottleneck. Thus, the caching of the application level that you want to do should usually occur on web servers to reduce the load on the database.

Regarding caching, heuristics are elements that are often available and / or expensive to generate (in terms of processing / memory of the database / web server). Typical candidates are the home page and any other landing page of the site - often the best approach for them is to generate a static file and maintain it. The following parts depend on your application, but as a rule, the most effective strategy is to cache as close as possible to the final result - often used HTML code. For your social network, this may be a list of recognized updates, or some.

Regarding user sessions, this is certainly a good candidate for caching. In this case, you are likely to get a lot of mileage from judicious use of the session area of the web server (assuming the JSP server). This data is stored in memory and is a good place to store user information displayed after user authentication on each page (for example, name and surname).

Now, the last thing to consider is the invalidity of the cache and is indeed an integral part of all this ( Naming the material is another difficult thing in computer science ). In this case, using this type of memcached or ehcache that others have talked about is the right approach. ehcache can be easily started while working with your Java application and works well with expiring things, with policies that are used recently and least used, and allow you to use both memory and disk for caching. What you need to think about is situations where you need to expire something from the cache before this graph, because the data has changed. In this case, you need to work with these dependencies in the architecture of your application so that it reads / writes to the cache as necessary.

How to implement caching for a web application

More articles: