Why is this Python code not thread safe?

I passed this as part of the school assignment, and the person who marked it mentioned that this section is not thread safe.

The purpose was to create a multi-threaded socket server in python that took a number and returned the Fibonacci value of that number. My approach was to memorize calculations by sharing a dictionary between each of the threads.

Here is the code (with error handling and so remote for brevity)

from socketserver import ThreadingMixIn, TCPServer, BaseRequestHandler class FibonacciThreadedTCPServer(ThreadingMixIn, TCPServer): def __init__(self, server_address): TCPServer.__init__(self, server_address, FibonacciThreadedTCPRequestHandler, bind_and_activate=True) #this dictionary will be shared between all Request handlers self.fib_dict = {0: 0, 1: 1, 2: 1} class FibonacciThreadedTCPRequestHandler(BaseRequestHandler): def handle(self): data = self.request.recv(1024).strip() num = int(data) result = self.calc_fib(self.server.fib_dict, num) ret = bytes(str(result) + '\n', 'ascii') self.request.sendall(ret) @staticmethod def calc_fib(fib_dict, n): """ Calculates the fibonacci value of n using a shared lookup table and a linear calculation. """ length = len(fib_dict) while length <= n: fib_dict[length] = fib_dict[length - 1] + fib_dict[length - 2] length = len(fib_dict) return fib_dict[n] 

I understand that reads and writes occur in the calc_fib method, and this usually means that the code is not thread safe. However, in this case, I consider it possible to prove that the code will always provide predictable results.

Is the fact that reading and writing can occur at the same time so as not to be considered thread safe? Or something is considered thread safe if it always returns the result with reliability.

Why I think this code will always give reliable results:

  • Reading will never occur at any index in the dictionary until writing occurs there.

  • Any subsequent record in any given index will contain the same number as previous records, therefore, regardless of when the read / write sequence occurs, it will always receive the same data.

I tested this by adding random dreams between each operation and making requests with several hundred threads at the same time, and the correct answer was already returned during my test.

Any thoughts or criticism will be appreciated. Thanks.

+5
source share
2 answers

In this particular case, the GIL should keep your code safe because:

  • CPython's built-in data structures are protected from actual damage (as opposed to just abnormal behavior) of the GIL (a dict in particular needs this guarantee, since class instances and a non-local scope usually use a dict for the / name lookup attribute, and without GIL just reading the values ​​would be fraught danger)
  • You update the cached length value, and then use it for the next set of operations instead of re-checking the length during the mutation; this can lead to multiple work (several threads see the old length and repeatedly calculate the new value), but since the key is always set to the same value, it does not matter whether they are set independently of each other.
  • You never delete your cache (if you did, this cached length will bite you)

So in CPython this should be fine. However, I cannot guarantee any guarantees for other Python translators; without a GIL, if they implement their dict without internal locking, it is entirely possible that the iterate operation caused by writing to one thread could result in reading another thread from the dict in an inconsistent / unusable state.

+3
source

First of all, why do you think dictionaries are thread safe? I quickly looked through the Python3 documentation (I am also starting Python), and I can’t be sure that two unsynchronized threads can safely update the same dictionary without distorting the internal components of the dictionary and possibly crashing the program.

Since 1980, I have been writing multithreaded code in other languages, and I have learned to never trust that something is thread safe just because it acts this way when I test it. I want to see documentation that should be thread safe. Otherwise, I use the mutex around it.

Secondly, you think that fib_dict[length - 1] and fib_dict[length - 2] will be valid. My experience with other programming languages ​​says not to assume this. In other programming languages ​​(for example, Java), when threads exchange data without synchronization, one thread can see that the variables are updated in a different order from the order in which some other thread executed them. For example, it is theoretically possible that a Java thread accesses a Map without synchronization to see the size() the Map increasing before it sees that the new values ​​are actually displayed on the map. I guess something like this can happen in Python until someone shows me the official documentation, which says otherwise.

+2
source

Source: https://habr.com/ru/post/1264958/


All Articles