Adding Elements from Two Arrays

You have two arrays

int[] a = {......} // Total elements 1 million int[] b = {......} // Total elements 1 million , Length is same for both arrays. 

Q1. I need to create an array

 int[] c 

whose elements are the sum of the corresponding indices a [] and b [] Ex.

  c[0] = a[0] + b[0]; c[1] = a[1] + b[1]; c[2] = a[2] + b[2]; 

Solution: β†’ I can use Multithreading. Divide the entire array into 10 or more parts and assign each segment to the stream to perform the calculation. Note. > Interviewer suggested using Multithreading

Q2. Now it has changed a bit. Elements of array C will have the following values: β†’

 c[0] = a[0] + b[0]; c[1] = a[1] + b[1] + c[0]; // At this line c[0] is Sum of a[0] + b[0]; The Above Line c[2] = a[2] + b[2] + c[1]; // At this line c[0] is Sum of a[1] + b[1]+ a[0] + b[0]; The Above Line 

MySolution-> Solve part 1 (Q1) and create a temporary array, and after that we have to do an addition like this.

 C[1] = temp[1]+temp[0] C[2] = temp[2]+temp[1] 

Note: β†’ We really do not need temp [], we can only do this using Array c. Just to explain it on SO in a simple way.

Problem-> I do not think that in question 2 we can use multithreding. Is there a better way to solve Q2? Can we use multithreading in this.

+6
source share
5 answers

You can make part 1 multi-threaded, as you say, giving -

 temp[0] = a[0] + b[0]; temp[1] = a[1] + b[1]; temp[2] = a[2] + b[2]; etc.... 

Then the calculation for part 2 becomes -

 c[0] = temp[0]; c[1] = temp[1] + temp[0]; c[2] = temp[2] + temp[1] + temp[0]; c[3] = temp[3] + temp[2] + temp[1] + temp[0]; etc... 

Although it looks consistent and impossible to parallelize, it is actually a fairly common operation called the β€œprefix sum” or β€œscan”. For more on how to parallelize, see Wikipedia or Blelloch .

In the case of 8 elements, this becomes the following, where each recursive phase can be parallelized, since each calculation does not depend on other calculations in the same phase.

 // 1st phase u[0] = temp[0] + temp[1]; u[1] = temp[2] + temp[3]; u[2] = temp[4] + temp[5]; u[3] = temp[6] + temp[7]; // 2nd phase v[0] = u[0] + u[1]; v[1] = u[2] + u[3]; // 3rd phase w[0] = v[0] + v[1]; // final phase c[0] = temp[0]; c[1] = u[0]; c[2] = u[0] + temp[2]; c[3] = v[0]; c[4] = v[0] + temp[4]; c[5] = v[0] + u[2]; c[6] = v[0] + u[2] + temp[6]; c[7] = w[0]; 
+4
source

In my opinion, for the second question you have two methods:

First you need to take two steps. step-1 using streams you can add

  c[0] = a[0] + b[0]; c[1] = a[1] + b[1]; c[2] = a[2] + b[2]; 

as you said.

But step 2 must be performed sequentially. because the value of c[ i + 1] depends on the updated value of c [i]

The second bit method is complex, but can be fast.

What you are asked to do in the second question is something like:

  c[0] = a[0] + b[0]; c[1] = a[1] + b[1] + a[0] + b[0]; c[2] = a[2] + b[2] + a[1] + b[1] + a[0] + b[0]; 

It can be in parallel.

 c[i] = thread1( sum(a[0]...a[i] )) + thread2( sum(b[0]...b[i] )) for i >= 0 

Well in this technique, you can calculate c[i] in parallel for all i (a dual-processor model with two levels).

You can further improve the functions of thread1 / thread2 as multi-threaded with child threads to execute the sum, but remember that once multi-threaded code runs slower than single-threaded code due to the switching context of the thread (so you should give enough tasks to each flow).

The point, which, unlike the second method, is "most of what streams do for c[i] , this is the same as streams for c[i-1] ."
Thanks @ jogojapan for letting me know about this shortcoming.

For the best technique, read the updated answer by Mr. Joyapan.

+5
source

In fact, you can use multithreading in this task. Your algorithm will consist of two parts:

  • apply the Q1 algorithm - this part will take advantage of multithreading.

  • only in one type of cycle are formyla used: c[n] = c[n] + c[n-1] , n=1...999999 .

+1
source

You can use multithreading here, on the way from the 1st question. I mean, you can calculate

 sumA0B0 = a[0] + b[0]; 

in separate threads and even wait for the calculation (sync ie on a [i]). Then in a separate thread you can calculate c[i] = sumAiBi + c[i-1];

+1
source

One way to use multithreading for Q2 is to do this in two passes (each of which uses T threads inside, where T can be freely selected):

  • Calculate c[i] = a[i] + b[i] + c[i-1] for all cells in the normal multithreaded mode, that is, after dividing the input into the ranges [0, r1), [r1, r2) ,. .. [rk, n) and applying one stream for each range. Yes, this will not be true for all ranges except the first, and step 2 will correct it.

  • Compute corrections again in a multi-threaded way. To do this, we look at the right value of each range, i.e. corr1:=c[r1-1] , corr2:=corr1+c[r2-1] , corr3:=corr2+c[r3-1] etc., which gives us a correction value for each stream, and then calculates again using multithreading with the same ranges as before, c[i] += corrk where corrk is the correction value for the k-th stream. (For a zero thread, we can use corr0:=0 , so the thread should not do anything.)

This improves the theoretical runtime by a factor of T, where T is the number of threads (which can be freely selected), so this is the optimal solution for multithreading.


To illustrate how this works, here is an example where we assume arrays of length n==30 . We also assume that we use 3 threads: one for calculating the range c[0..9] , one for c[10..19] and one for c[20..29] .

It is clear that the goal in the cell c[i] for any 0<i<n obtained

 c[i] == a[0]+...+a[i]+b[0]+...+b[i] 

(that is, the sum of all a[0..i] and all b[0..i] ) after the completion of the algorithm. Let's see how the algorithm works there, for example cell i==23 . This cell is processed by the third thread, that is, the thread responsible for the range c[20..29] .

Step 1: Stream Sets

 c[20] = a[20]+b[20] c[21] = c[20]+a[21]+b[21] == a[20]+a[21]+b[20]+b[21] c[22] = c[21]+a[22]+b[22] == a[20]+a[21]+a[22]+b[20]+b[21]+b[22] c[23] = c[22]+a[23]+b[23] == a[20]+a[21]+a[22]+a[23]+b[20]+b[21]+b[22]+b[23] ... 

So, after step 1 is completed, we have some of a[20..23] and b[20..23] in cell c[23] . What is missing is the sum of a[0..19] and b[0..19] .

Similarly, the first and second threads set the values ​​in c[0..9] and c[10..19] so that

 c[0] = a[0]+b[0] c[1] = c[0]+a[1]+b[1] == a[0]+a[1]+b[0]+b[1] ... c[9] = a[0]+...+a[9]+b[0]+...+b[9] 

and

 c[10] = a[10]+b[10] ... c[19] = a[10]+...+a[19]+b[10]+...+b[19] 

Step 2: The correction value for the third thread corr2 is the sum of corr1 and the rightmost value calculated by the second thread, while corr1 is the rightmost value calculated by the first thread, Therefore

 corr2 == c[9]+c[19] == (a[0]+...+a[9]+b[0]+...+b[9]) + (a[10]+...+a[19]+b[10]+...+b[19]) 

And this is indeed the sum absent in the value of c[23] after step 1. In step 2, we add this value to all elements of c[20..29] , therefore, after completing step 2, c[23] is correct (and all other cells )


The reason this works is that calculating cell values ​​is an operation from left to right, strictly incremental, and the order of operations for calculating a single cell does not matter (since + associative and commutative). Therefore, the final result (the "rightmost value") of any given stream after step 1 can be used to correct the results of the streams responsible for the ranges to the right in step 2.

0
source

Source: https://habr.com/ru/post/949799/


All Articles