Paralellizing svn up freezing client

I am writing a program to run svn up in parallel, and this causes the machine to freeze. When this happens, the server does not experience a boot problem.

Commands are launched using ThreadPool.map() on subprocess.Popen() :

 def cmd2args(cmd): if isinstance(cmd, basestring): return cmd if sys.platform == 'win32' else shlex.split(cmd) return cmd def logrun(cmd): popen = subprocess.Popen(cmd2args(cmd), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=curdir, shell=sys.platform == 'win32') for line in iter(popen.stdout.readline, ""): sys.stdout.write(line) sys.stdout.flush() ... pool = multiprocessing.pool.ThreadPool(argv.jobcount) pool.map(logrun, _commands) 

argv.jobcount is a smaller multiprocessing.cpu_count() and the number of tasks to perform (in this case, 4). _commands - a list of lines with the commands below. shell set to True on Windows, so the shell can find executables because Windows does not have a which command, and finding the executable is a bit more complicated on Windows (the commands used for the cd directory&&svn up .. form, which also requires shell=True , but now this is done using the cwd parameter).

executable commands

  svn up w:/srv/lib/dktabular svn up w:/srv/lib/dkmath svn up w:/srv/lib/dkforms svn up w:/srv/lib/dkorm 

where each folder is a separate project / repository, but exists on the same Subversion server. The svn executable is a package with TortoiseSVN 1.8.8 (build 25755 - 64 bits). Code updated (i.e. svn up - no-op).

When the client freezes, the memory panel in the task manager is empty first:

Blacked out memory bar

and sometimes everything gets dark

Frozen

If I wait a while (a few minutes), the car will eventually return.

Q1: Does it cocassify in parallel to trigger svn ?

Q2: Are there any problems with the way I use ThreadPool.map() and subprocess.Popen() ?

Q3: Are there any tools / strategies for debugging these problems?

+5
source share
1 answer

I will do my best to answer all three questions, and I welcome corrections in my statements.

Q1: Is it possible to coaxize an svn call in parallel?

Copacetic, this is strong, but I would say that it is not recommended and not recommended. With this statement, source control tools have certain functionality that requires process lock and level lock (best guess). Checksums, file transfers and reading / writing files require locking for proper processing or you risk duplicating efforts and file conflicts, which will lead to process crashes.

Q2: Are there any problems with the way I use ThreadPool.map() and subprocess.Popen() ?

While I do not know the absolute specifications for subprocess.Popen() , since I used it in 2.6, I can talk a little about programmability. What you do in the code you create creates a pool of one specific subprocess instead of directly calling processes. Now from the head, and with my understanding, ThreadPool() is that it does not lock by default. This may cause problems with subprocess.Popen() , I'm not sure. As for my answer above, locking is what needs to be implemented. I would recommend taking a look at fooobar.com/questions/15727 / ... for a better understanding of the differences between threads and pool, as I would recommend using threads instead of mutliprocessing. With the nature of version control applications that require locking, if you intend to parallelize operations during lock processing, you will also need to synchronize threads so that the work is not duplicated. I tested the test a few months ago on Linux with multiprocessing, and I noticed that grep was repeating the global search. I will see if I can find the code I wrote and paste it. With thread synchronization, I would hope that Python will be able to pass the status of the svn thread between threads so that svn can understand that process duplication does not occur. However, I do not know how svn works under the hood from this aspect, so I am only thinking / making the best assumption. Since svn probably uses a rather complicated locking method (I would say that block-level locking, not inode locking, but again, best guessed), it probably makes sense to implement semaphore locking instead of lock() or Rlock() . However, you will have to go through and test various locking and synchronization methods to find out what works best for svn. This is a good resource when it comes to thread synchronization: http://effbot.org/zone/thread-synchronization.htm

Q3: Are there any tools / strategies for debugging these problems?

Of course, threads and multiprocessing should have logging features that you can use in conjunction with logging. I would just register in the file so that you have something for the link instead of just outputting to the console. Theoretically, you should be able to use logging.debug(pool.map(logrun, _commands)) and record the resulting processes. However, I'm not a logging specialist with streaming processing or multiprocessing, so someone else can answer this better than me.

Are you using Python 2.x or 3.x?

0
source

Source: https://habr.com/ru/post/1210038/


All Articles