Weighted distribution between buckets without waiting

I have Nworkers who need to process incoming batches of data. Each worker is set up so that he knows that this is the "worker Xof N."

Each incoming batch of data has a random unique ID(random, it is evenly distributed) and has a different size; processing time is proportional to size. Size can vary greatly.

When a new batch of data is available, it immediately appears as available to all N workers, but I want only one to process it, without coordination between them . Right now, every employee calculates ID % N == X, and it’s true, the worker assigns the package himself, and the rest pass it. This works correctly and ensures that on average each worker processes the same number of batches. Unfortunately, it does not take into account the size of the batch, so some workers can finish processing much later than others, since there may be problems with assigning very large tasks.

How can I change the algorithm so that each worker assigns batches himself so as to take into account the lot size, so that on average each worker will independently assign the same total work size (from different batches)?

+4
source share
3 answers
//Using a queue to store the workers
//This way we can dequeue and reenqueue workers when they accept jobs
var _queue = new Queue<Worker>[numOfWorkers];

void Setup() {
  for (int i = 0;i<numOfWorkers -1;i++) {
      _queue.Enqueue(new Worker());
  }
}

//Assigns the job to the next worker in line and puts it at the end of queue
void AcceptJob(Job j) {
    var w = FindNextAvailableWorker();
    w.AssignNewJob(j);
    _queue.Enqueue(_queue.RemoveAt(_queue.PositionOf(w)));
}

//Finds the first free worker or returns the front of queue
Worker FindNextAvailableWorker() {
    var w = _queue.front();

    while (int i=0;i<_queue.length-1<i++) {
        if (_queue[i].isWorking==false){
            w = _queue[i];
            exit loop;
        }  
    }

    return w; 
}
0
source

The general idea: all nodes save for each node the work that it has done so far, and this affects the work that it will receive. This is done in a deterministic way, so all nodes will get the same results and do not need to communicate. We nevertheless have a module, however a node with less work has a larger range of numbers.

Algorithm:

. node , , node (5% , 35%...) nodeProportion.

(100-nodeProportion) + 0,001 * Node_ID. , 100 HASH 1-100, K.

(100-nodeProportion), . node.

, .

0

, :

  • , .. , X = distributor( arguments )
  • X = ID % N, , -,
  • S, () . - X = F(S, ID) % N
  • , modulo op

X = hash( ID * S ) % N

-, ID*S , . ...

0

Source: https://habr.com/ru/post/1655864/


All Articles