What is the advantage of message queuing over shared data in streaming?

Question

What is the advantage of message queuing over shared data in streaming?

I read an article on the multi-threaded software project http://drdobbs.com/architecture-and-design/215900465 , it says that it’s best practice to “replace shared data with asynchronous messages. As much as possible, prefer to keep separate stream data ( not separated) and allow threads to instead communicate via asynchronous messages that transmit copies of the data. "

What bothers me is that I don’t see the difference between using shared data and message queues. Now I'm working on a non-GUI project on Windows, so let me use Windows message queues. and take the traditional problem of consumer producers as an example.

Using common data, there will be a common container and a lock protecting the container between the producer stream and the consumer stream. When the product is releasing a product, first wait for the lock, then write something into the container, then release the lock.

Using a message queue, a producer can simply PostThreadMessage without a block. and this is the advantage of an asynchronous message. but I think there is some kind of lock that protects the message queue between two threads, otherwise the data will definitely be corrupted. calling PostThreadMessage just hides the details. I don’t know if my guess is right, but if it’s true, the advantage seems to no longer exist, since both methods perform the same thing, and the only difference is that the system hides details when using message queues.

ps. perhaps the message queue uses a non-blocking connector, but I could use the parallel container as before. I want to know how the message queue is implemented, and is there a performance difference between the two methods?

update: I still don't understand the concept of asynchronous message if message queue operations are still blocked elsewhere. Correct me if my assumption is wrong: when we use shared containers and locks, we will block in our stream. but when using message queues, the thread itself returned immediately and left the lock operation on some system thread.

+6

c ++ multithreading windows message-queue

Jason Aug 19 '11 at 5:27

source share

7 answers

Imagine that you have 1 data stream and 4 threads process this data (presumably to use a multi-core machine). If you have a large global data pool, you will probably have to block it when any of the threads needs access, potentially blocking 3 other threads. As additional processing flows are added, you increase the likelihood of blocking, which should wait and increase the number of things that can wait. Ultimately, adding more threads does not work, because everything you do blocks more time.

If instead you have one thread sending messages in the message queue, one for each consumer thread, then they cannot block each other. You need to block the queue between the producer and consumer threads, but since you have a separate queue for each thread, you have a separate lock, and each thread cannot block everyone else waiting for data.

If you suddenly get a 32-core machine, you can add another 20 threads of processing (and queues) and expect that the performance will scale quite linearly, unlike the first case, when new threads will just work with each other all the time.

+6

jcoder Aug 19 '11 at 7:40

source share

Of course, when sending messages, there is "general data." In the end, the message itself is a kind of data. However, the important difference is that when you send the message, the consumer will receive a copy .

calling PostThreadMessage just hides the details

Yes, it is, but as a WINAPI call, you can be sure that everything is correct.

I still don't understand the concept of an asynchronous message if message queue operations are still blocked elsewhere.

Advantage - safer. You have a locking mechanism that is systematically applied when sending a message. You don’t even need to think about it, you can’t forget to block. Given that multi-threaded errors are some of the most nasty (consider the conditions of the race), this is very important. Messaging is a higher level of abstraction created on locks.

the disadvantage is that transferring large amounts of data will probably be slow. In this case, you need to use shared needs memory.

For a transmission state (i.e., a workflow stream transmitted to a graphical interface), messages are a transition method.

+2

Tamás Szelei Aug 19 '11 at 8:10

source share

I used a shared memory model, where pointers to shared memory are managed in a message queue with careful locking. In a sense, it is a hybrid between a message queue and shared memory. This is very important when a large amount of data must be transferred between threads, while maintaining the security of the message queue.

The entire queue can be packed into one C ++ class with the corresponding lock, etc. The key is that the queue owns shared storage and takes care of the lock. Manufacturers acquire a lock for entering the queue and get a pointer to the next available piece of storage (usually some object), fill it and release it. The consumer is blocked until the next shared object is released. Then it can get a lock for the storage, process the data and release it back to the pool. In a suitable configuration, the queue can perform several operations with the producer / several consumers with great efficiency. Think of the semantics of Java thread safe (java.util.concurrent.BlockingQueue), but for pointers to storage.

+2

David weber Aug 19 '11 at 9:42

source share

I think this is the key part of the information there: "If possible, prefer to keep separate thread data (not shared), and also transmit streams through asynchronous messages that transmit copies of the data." That is, use the producer-consumer :)
You can send messages yourself or use something provided by the OS. This is an implementation detail (needs to be done directly from c). The key is to avoid shared data, just as in the fact that one memory area is changed by several threads. This can make it difficult to find errors, and even if the code is perfect, it will consume performance due to all the blocking.

+1

Torp Aug 19 '11 at 6:13

source share

This is pretty simple (I am amazed that others wrote such length answers):

Using a message queuing system instead of raw raw data means that you only need to synchronize (lock / unlock resources) once, in a central place.

With a message-based system, you can think in higher terms of “message” without worrying about synchronization issues. For what it's worth, it's entirely possible that the message queue is implemented using common data inside.

+1

Frerich raabe Aug 19 '11 at 8:14

source share

I had the same question. After reading the answers. I feel:

in the most typical use case, queue = async, shared memory (locks) = sync. In fact, you can make an asynchronous version of the shared memory, but this is more of a code similar to reusing the messaging wheel.
Less code = less bugs and more time to focus on other things.

The pros and cons are already mentioned in previous answers, so I will not repeat.

0

Helin wang Jan 19 '13 at 5:42

source share

Eric Z · Accepted Answer · 2011-08-19T05:41:01+0000

Messaging is useful for exchanging less data because conflicts need not be avoided. This is much easier to implement than shared memory for inter-computer communication. In addition, as you have already noticed, messaging has the advantage that application developers do not need to worry about security details such as shared memory.

Shared memory provides maximum speed and ease of communication, as this can be done at memory speeds when inside the computer. Shared memory is usually faster than messaging, because messaging is usually done using system calls and therefore requires more time-consuming kernel intervention tasks. In contrast, in shared memory systems, system calls are only required to create shared memory areas. Once installed, all access is considered normal memory access without additional help from the kernel.

Change One case where you might want to implement your own queue is that there are many messages that need to be created and consumed, such as a logging system. With the implementation of PostThreadMessage, its queue capacity is fixed. Messages will be most exciting if this power is exceeded.

What is the advantage of message queuing over shared data in streaming?

More articles: