Buffering db tabs in a multi-threaded program

Question

Buffering db tabs in a multi-threaded program

I have a system that breaks large taks into small tasks, using about 30 threads as time. As each individual stream completes, it stores its calculated results in a database. I want to ensure that each thread passes its results to a new persisance class that will perform double buffering and data resilience while working on its own thread.

For example, after 100 threads have transferred their data to the buffer, the persistance class, the persistance class will change the buffers and store all 100 records in the database. This would allow the use of prepared statements and, thus, reduce the I / O speed between the program and the database.

Is there a sample or a good example of this type of multi-threaded double buffering?

+4

java multithreading design-patterns

Winter May 11, '10 at 15:26

source share

2 answers

To have a lower synchronization load, use the local stream (for each calculation stream) to create batches of results. After achieving a certain number of results, put the packet in the blocking queue. Use ArrayBlockingQueue to support your persistence class, since you probably don't want your memory usage to become unlimited. You can have multiple database write threads by taking groups of results and storing them in the database.

class WriteBehindPersister { ThreadLocal<List<Result>> internalBuffer; static ArrayBlockingQueue<List<Result>> persistQueue; static { persistQueue = new ArrayBlockingQueue(10); new WriteThread().start(); } public WriteBehindPersister() { internalBuffer = new ThreadLocal<List<Result>>(); } public void persist(Result r) { List<Result> localResult = internalBuffer.get(); localResult.add(r); if (localResult.size() > max) { persistQueue.put(new ArrayList(localResult)); localResult.clear(); } } class WriteThread extends Thread { public void run() { while (true) { List<Result> batch = persistQueue.take(); beginTransaction(); for (Result r : batch) { batchInsert(r); } endTransaction(); } } } }

In addition, you can use the executing service (instead of a single write stream) to simultaneously save several packets in the database in case of a compromise using more than one database connection. Be sure to use the JDBC batch processing API if your driver supports it.

+1

Justin May 11, '10 at 16:04

source share

Steved · Accepted Answer · 2010-05-11T15:42:20+0000

I saw this template called asynchronous write database or write behind template. This is a typical pattern supported by distributed cache products (Teracotta, Coherence, GigaSpaces, ...) because you do not want your cache updates to also include writing changes to the underlying database.

The complexity of this template depends on your tolerance for lost database updates. Due to the delay between shutting down and writing the result to the database, you may lose updates due to errors, power failures, ... (you will get an image).

I would suggest some queue for the completed results, which should be written to the database, and then process them in batches of 100 (using your example) OR after a while. The reason for using the time delay is also the coincidence with result sets that are not divisible by 100.

If you do not have sustainability / longevity requirements, you can do all this in the same process. If, however, you cannot tolerate any loss, you can replace the in-vm queue with a permanent JMS queue (slower but safer).

Buffering db tabs in a multi-threaded program

More articles: