Thread Pool, Shared Data, Java Sync

Say I have a data object:

class ValueRef { double value; }

Where each data object is stored in the main collection:

Collection<ValueRef> masterList = ...;

I also have a set of tasks, where each task has a local set of data objects (where each data object also appears in masterList ):

 class Job implements Runnable { Collection<ValueRef> neededValues = ...; void run() { double sum = 0; for (ValueRef x: neededValues) sum += x; System.out.println(sum); } } 

Use case:

  • for (ValueRef x: masterList) { x.value = Math.random(); }

  • Define a job queue using some jobs.

  • Thread Pool Awakening

  • Wait until each assignment is rated.

Note. During performance evaluation, all values ​​are constant. However, threads may have evaluated jobs in the past and stored cached values.

Question: What is the minimum amount of synchronization needed to ensure that each thread sees the latest values?

I understand synchronization from the point of view of the monitor / lock, I do not understand synchronization from the point of view of the cache / flash (that is, which is guaranteed by the memory model when the synchronized block enters / exits).

It seems to me that I need to synchronize once in a thread that updates the values ​​to commit new values ​​in the main memory and once per worker thread to clear the cache so that new values ​​are read. But I'm not sure how best to do this.

My approach: create a global monitor: static Object guard = new Object(); Then sync with guard when updating the main list. Then, finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.

Does this really cause a full stream of any value read by this stream? Or just the values ​​affected inside the synchronization block? In this case, instead of an empty block, maybe I should read each value once in a loop?

Thank you for your time.


Edit: I think my question comes down to the fact that, as soon as I exit the synchronized block, every first time it reads (after this point) the transition to the main memory? No matter what I synced?

+6
source share
3 answers

It does not matter that thread pool threads evaluate some jobs in the past.

Javadoc Executor says:

Memory Consistency Effects: Actions on a thread before sending a Runnable to Executor take place before it starts, possibly on another thread.

So, as long as you use the standard implementation of the thread pool and change the data before sending jobs, you should not worry about the effects of memory visibility.

+3
source

What you are planning sounds enough. It depends on how you plan to wake up the thread pool.

The Java memory model ensures that all entries made by a thread before entering a synchronized block are visible to threads that are subsequently synchronized with this lock.

So, if you are sure that the worker threads are blocked in the wait() call (which should be inside the synchronized block) during the main list update, when they wake up and become available, the changes made by the main thread will be visible to these threads.

I would recommend, however, to use higher-level concurrency utilities in the java.util.concurrent package. They will be more reliable than your own solution, and this is a good place to study concurrency before delving into it.


Just to clarify: it is practically impossible to control workflows without using a synchronized block, where a check is done to see if the employee has a task for implementation. Thus, any changes made by the controller thread to the work happen before the workflow wakes up. Locking memory requires a synchronized block, or at least a volatile variable; however, I cannot imagine how to create a thread pool using one of them.

As an example of the benefits of using the java.util.concurrency package, consider this: you can use a synchronized block with a wait() call in it or a busy-wait cycle with the volatile variable. Due to the overhead of switching contexts between threads, waiting for a wait can actually improve under certain conditions: you don’t need a terrible idea that you might think at first glance.

If you use concurrency utilities (in this case, possibly ExecutorService ), the best choice for your specific case can be made for you, factoring in the environment, the nature of the task and the needs of other threads at a given time. Reaching this level of optimization is a lot of unnecessary work.

+2
source

Why don't you make the Collection<ValueRef> and ValueRef immutable, or at least not change the values ​​in the collection after publishing the collection reference. Then you will not worry about synchronization.

That is, when you want to change the values ​​of a collection, create a new collection and put new values ​​in it. After the values ​​have been set, pass links to the collection of new task objects.

The only reason not to do this is if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or replacing the collections will cause too much work for the collector garbage (prove that one of of these problems is a problem before using a mutable data structure for streaming code).

+1
source

Source: https://habr.com/ru/post/919071/


All Articles