As part of the Laravel application , I am trying to write a PHP script that retrieves certain data, that is, constantly updates , from across the network about some products, books to be exact.
Problem:
Books are identified by ISBN, a ten-digit identifier. The first 9 digits can be 0-9, and the last digit can be 0-9 or X. However, the last digit is a check digit, which is calculated based on the first 9 digits, so there really is only one possible digit for the last place.
In this case, we achieve:
10*10*10*10*10*10*10*10*10*1 = 1,000,000,000
numerically correct ISBNs. I can do a little better if I limit my search to English books, as they will only contain 0 or 1 as the first digit. So I get:
2*10*10*10*10*10*10*10*10*1 = 200,000,000
numerically correct ISBNs.
Now, for each ISBN, I have 3 http requests needed to retrieve the data, each of which takes about 3 seconds. Thus:
3seconds*3requests*200,000,000ISBNs = 1,800,000,000 seconds
1,800,000,000seconds/60seconds/60minutes/24hours/365days = ~57 years
I hope that after 57 years there will no longer be such a thing as a book, and this algorithm will be obsolete.
Actually, since the data that interests me is constantly changing, in order for this algorithm to be useful, it would have to complete each pass for several days (ideal 2-7 days).
So the problem is how to optimize this algorithm to reduce the runtime from 57 years to one week?
:
1) , , , 200 000 000 ISBN, ISBN, , , http- ISBN ( ISBN HTTP-, ). , 1 // , ISBN, ISBN . p >
1 , , , . , . ( , , , !)
2) , , HTTP. , Threads.
, , :
(numISBNs/numThreads)*secondsPerISBN = totalSecondsToComplete
numThreads:
numThreads = (numISBNs * secondsPerISBN) / totalSecondsToComplete
, :
totalSecondsToComplete = 7days * 24hrs * 60min * 60sec = 604,800seconds
numISBNs = 200,000,000
secondsPerISBN = 3
numThreads = (200,000,000 * 3) / 604,800
numThreads = ~992
992 , . DigitalOcean? mac , 2000 , , .
():
1) 992 DigitalOcean?
2) , http ? http?
3) - , , , ?