For my application, how many threads would be optimal?

I have a simple Python search robot. It uses SQLite to store its output, as well as to store the queue. I want the crawler to be multi-threaded so that it can scan multiple pages at a time. I decided that I would make a thread and immediately start several instances of the class, so all of them will be launched simultaneously. But the question is, how much should I run right away? Should I stick to two? can i go higher What would be a reasonable limit for a number of threads? Keep in mind that each thread goes to a web page, loads html, runs some regular expressions through it, saves the information it finds in SQLite db, and then pushes the next URL out of the queue.

+3
source share
7 answers

You will probably find that your application is limited in bandwidth, and not CPU or I / O is limited.

So add as much as you like until performance starts to deteriorate.

You may encounter other restrictions depending on your network setup. For example, if you are behind an ADSL router, there will be a limit on the number of simultaneous NAT sessions, which can affect the simultaneous execution of too many HTTP requests. Do too much, and your provider may treat you as being infected with a virus, etc.

The question also arises, how many requests are processed by the server that you are viewing, and how much load do you want to put on it.

, . , , , . , , .

. 1-5 , , 20-30 .

+13

twisted , URL-.

, , , .

+7

. , Popens, .

"" . , , . ? .

, , , . , .

- , , , , . , .

, , . , .

.

1 1 . 100 , , 100 . 100 . , 25 50 .

, , .

+3

cletus - , .

-, Twisted. , pycurl, libcurl, URL- . PyCurl " retriever-multi.py ' , , 120 .

+3

. , , .

Python ( "" ), , . , . , 5-10, , , .

, Python ( Python 2.6), , .

+1

, IP-, DoS-, , .

, (5 ).

+1

. /, . , , , select() . . Twisted, , . URL-, , . , , . , , , "" . . "" .

, .

- , ( ) .

0

Source: https://habr.com/ru/post/1704399/


All Articles