How can I vectorize code that runs a function on subsets of a larger matrix?

Suppose I have the following 9 x 5 matrix:

myArray = [ 54.7 8.1 81.7 55.0 22.5 29.6 92.9 79.4 62.2 17.0 74.4 77.5 64.4 58.7 22.7 18.8 48.6 37.8 20.7 43.5 68.6 43.5 81.1 30.1 31.1 18.3 44.6 53.2 47.0 92.3 36.8 30.6 35.0 23.0 43.0 62.5 50.8 93.9 84.4 18.4 78.0 51.0 87.5 19.4 90.4 ]; 

I have 11 β€œsubsets” of this matrix, and I need to run a function (let max ) on each of these subsets. Subsets can be identified using the following matirx booleans (identified by columns, not rows):

 myLogicals = logical([ 0 1 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 ]); 

or using linear indexing:

 starts = [2 5 8 10 15 23 28 31 37 40 43]; #%index start of each subset ends = [3 6 9 13 18 25 29 33 38 41 45]; #%index end of each subset 

such that the first subset is 2: 3, the second is 5: 6, etc.

I can find the max each subset and save it in a vector as follows:

 finalAnswers = NaN(11,1); for n=1:length(starts) #%ie 1 through the number of subsets finalAnswers(n) = max(myArray(starts(n):ends(n))); end 

After starting the loop, finalAnswers contains the maximum value of each of the data subsets:

 74.4 68.6 78.0 92.9 51.0 81.1 62.2 47.0 22.5 43.5 90.4 

Is it possible to get the same result without using a for loop? In other words, can this code be vectorized? Would such an approach be more effective than the current one?


EDIT: I checked out some suggested solutions. The data I used was a 1.510 x 2.185 matrix with 10,103 subsets, which ranged from 2 to 916 with a standard deviation of the length of the subset of 101.92.

I wrapped every solution in tic;for k=1:1000 [code here] end; toc; tic;for k=1:1000 [code here] end; toc; and here are the results:

  • for loop approach --- Elapsed time is 16.237400 seconds.
  • Shaysky approach --- Elapsed time is 153.707076 seconds.
  • Dan is approaching --- Elapsed time is 44.774121 seconds.
  • Divakar Approach No. 2 --- Elapsed time is 127.621515 seconds.

Notes:

  • I also tried to map the Dan approach by wrapping the k=1:1000 for accumarray around the accumarray line (since the rest could theoretically be run only once). In this case, the time was 28.29 seconds.
  • Benchmarking Shay approach, leaving the line lb = ... cycle k , the time was 113.48 seconds.
  • When I ran the Divakar code, I got Non-singleton dimensions of the two input arrays must match each other. errors Non-singleton dimensions of the two input arrays must match each other. for bsxfun strings. I β€œfixed” this using conjugated transposition (the apostrophe operator ' ) on trade_starts(1:starts_extent) and intv(1:starts_extent) in the lines of code that calls bsxfun . I do not know why this error occurred ...

I'm not sure my benchmarking setting is correct, but it seems that the for loop actually works faster in this case.

+5
source share
4 answers

One approach is to use accumarray . Unfortunately, for this we first need to "stick" your logical matrix. Here's a tricky way to do this if you don’t have the image processing tools:

 sz=size(myLogicals); s_ind(sz(1),sz(2))=0; %// OR: s_ind = zeros(size(myLogicals)) s_ind(starts) = 1; labelled = cumsum(s_ind(:)).*myLogicals(:); 

So, what does the implementation of Shai bwlabeln only bwlabeln (but it will be 1 -by- numel(myLogicals) in form, not in form size(myLogicals) )

Now you can use accumarray :

 accumarray(labelled(myLogicals), myArray(myLogicals), [], @max) 

otherwise it might be easier to try

 result = accumarray(labelled+1, myArray(:), [], @max); result = result(2:end) 

This is fully vectorized, but is it worth it? You will need to do speed tests against your cycle decision in order to know.

+3
source

Use bwlabeln with vertical link:

 lb = bwlabeln( myLogicals, [0 1 0; 0 1 0; 0 1 0] ); 

You now have a label of 1..11 for each area.

To get the maximum value, you can use regionprops

 props = regionprops( lb, myArray, 'MaxIntensity' ); finalAnswers = [props.MaxIntensity]; 

You can use regionprops to get some other properties of each subset, but this is not too general.
If you want to apply a more general function for each region, for example, median , you can use accumarray :

 finalAnswer = accumarray( lb( myLogicals ), myArray( myLogicals ), [], @median ); 
+1
source

Ideas for vectorization and optimization

One approach that can be used to vectorize this problem would be to convert the subsets into blocks with the correct shape, and then search for the maximum of elements from these blocks at a time. Now, converting to regular shaped blocks has one problem here, and that is, the subsets are uneven in length. To avoid this problem, you can create a 2D index matrix, starting from each of the starts elements and continuing to the maximum length of the subset. The good thing about this is that it allows vectorization, but at the expense of more memory requirements, which will depend on the variation in the length of the subsets.

Another problem with this vectorization method would be that it could potentially lead to the creation of indexes outside the finite subsets. To avoid this, you can think of two possible ways -

  • Use a larger input array, expanding the input array so that the maximum number of subsets plus starting indices are still within the limits of the extended array.

  • Use the original input array for starters until we are within the original input array, and then use the loop source code for the rest of the subsets. We can call it mixed programming just for the sake of a short name. This will save us from memory requirements when creating an extended array, as discussed earlier in another approach.

Listed below are these two ways / approaches.

Approach # 1: Vectorized Technique

 [m,n] = size(myArray); %// store no. of rows and columns in input array intv = ends-starts; %// intervals max_intv = max(intv); %// max interval max_intv_arr = [0:max_intv]'; %//'# array of max indices extent [row1,col1] = ind2sub([mn],starts); %// get starts row and column indices m_ext = max(row1+max_intv); %// no. of rows in extended input array myArrayExt(m_ext,n)=0; %// extended form of input array myArrayExt(1:m,:) = myArray; %// New linear indices for extended form of input array idx = bsxfun(@plus,max_intv_arr,(col1-1)*m_ext+row1); %// Index into extended array; select only valid ones by setting rest to nans selected_ele = myArrayExt(idx); selected_ele(bsxfun(@gt,max_intv_arr,intv))= nan; %// Get the max of the valid ones for the desired output out = nanmax(selected_ele); %// desired output 

Approach # 2: Mixed Programming

 %// PART - I: Vectorized technique for subsets that when normalized %// with max extents still lie within limits of input array intv = ends-starts; %// intervals max_intv = max(intv); %// max interval %// Find the last subset that when extended by max interval would still %// lie within the limits of input array starts_extent = find(starts+max_intv<=numel(myArray),1,'last'); max_intv_arr = [0:max_intv]'; %//'# Array of max indices extent %// Index into extended array; select only valid ones by setting rest to nans selected_ele = myArray(bsxfun(@plus,max_intv_arr,starts(1:starts_extent))); selected_ele(bsxfun(@gt,max_intv_arr,intv(1:starts_extent))) = nan; out(numel(starts)) = 0; %// storage for output out(1:starts_extent) = nanmax(selected_ele); %// output values for part-I %// PART - II: Process rest of input array elements for n = starts_extent+1:numel(starts) out(n) = max(myArray(starts(n):ends(n))); end 

Benchmarking

In this section, we compare the two approaches and the source code of the loop against each other for performance. Let the installation codes before starting the actual benchmarking -

 N = 10000; %// No. of subsets M1 = 1510; %// No. of rows in input array M2 = 2185; %// No. of cols in input array myArray = rand(M1,M2); %// Input array num_runs = 50; %// no. of runs for each method %// Form the starts and ends by getting a sorted random integers array from %// 1 to one minus no. of elements in input array. That minus one is %// compensated later on into ends because we don't want any subset with %// starts and ends as the same index y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]); starts = y1(1,:); ends = y1(1,:)+1; %// Remove identical starts elements invalid = [false any(diff(starts,[],2)==0,1)]; starts = starts(~invalid); ends = ends(~invalid); %// Create myLogicals myLogicals = false(size(myArray)); for k1=1:numel(starts) myLogicals(starts(k1):ends(k1))=1; end clear invalid y1 k1 M1 M2 N %// clear unnecessary variables %// Warm up tic/toc. for k = 1:100 tic(); elapsed = toc(); end 

Now, the placebo codes that give us lead time are

 disp('---------------------- With Original loop code') tic for iter = 1:num_runs %// ...... approach #1 codes end toc %// clear out variables used in the above approach %// repeat this for approach #1,2 

Test results

In your comments, you mentioned the use of 1510 x 2185 matrix , so there are two cases with such sizes and subsets of sizes 10000 and 2000 .

Case 1 [Input - matrix 1510 x 2185, subsets - 10000]

 ---------------------- With Original loop code Elapsed time is 15.625212 seconds. ---------------------- With Approach #1 Elapsed time is 12.102567 seconds. ---------------------- With Approach #2 Elapsed time is 0.983978 seconds. 

Case 2 [Input - 1510 x 2185 matrix, subsets - 2000]

 ---------------------- With Original loop code Elapsed time is 3.045402 seconds. ---------------------- With Approach #1 Elapsed time is 11.349107 seconds. ---------------------- With Approach #2 Elapsed time is 0.214744 seconds. 

Case 3 [Large input - 3000 x 3000 matrix, subsets - 20,000]

 ---------------------- With Original loop code Elapsed time is 12.388061 seconds. ---------------------- With Approach #1 Elapsed time is 12.545292 seconds. ---------------------- With Approach #2 Elapsed time is 0.782096 seconds. 

Note that the number of num_runs runs num_runs been changed to keep the execution time of the fastest approach close to 1 sec .

Conclusion

So, I think mixed programming (Approach No. 2) is the way to go! For future work, you can use standard deviation in the scattering criteria if the performance suffers due to the scattering and unloading of the work for most scattered subsets (along their length) into the loop code.

+1
source

Efficiency

Measure both the vectorised and for-loop code samples on your respective platform (whether local or cloud) to see the difference:

 MATLAB:7> tic();max( myArray( startIndex(:):endIndex(:) ) );toc() %% Details Elapsed time is 0.0312 seconds. %% below. %% Code is not %% the merit, %% method is: 

and

 tic(); %% for/loop for n = 1:length( startIndex ) %% may be max( myArray( startIndex(n):endIndex(n) ) ); %% significantly end %% faster than toc(); %% vectorised Elapsed time is 0.125 seconds. %% setup(s) %% overhead(s) %% As commented below, %% subsequent re-runs yield unrealistic results due to caching artifacts Elapsed time is 0 seconds. Elapsed time is 0 seconds. Elapsed time is 0 seconds. %% which are not so straight visible if encapsulated in an artificial in-vitro %% via an outer re-run repetitions ( for k=1:1000 ) et al ( ref. in text below ) 

For a better interpretation of the test results, rather check for much larger sizes than dozens of rows / columns.

EDIT: Wrong code deleted, thanks Dan for the notice. Paying more attention to emphasizing quantitative validation, which can prove the assumption that vector code can, but not necessarily under any circumstances, be faster, is no excuse for a faulty code, of course.

The output is quantitatively comparative data:

Although it is recommended that IMHO does not make honest decisions, memalloc and similar overhead should be excluded from in-vivo testing. The rerun test usually shows performance improvements in the VM page, other cache artifacts, while the raw first β€œVirgin” run is what usually appears in real code deployment (with the exception of external iterators, of course). Therefore, consider the results with caution and re-testing in a real environment (sometimes executed as a virtual machine inside a larger system), which also makes VM-swap mechanics take into account when huge matrices start to root for real memory access templates).

In other projects, I use the [usec] granularity of the real-time testing time, but the more attention needs to be taken into account regarding the conditions of the test execution and the O / S background .

Thus, nothing but testing gives appropriate answers to your specific situation with the code / deployment, however, it should be methodical to compare data that are comparable in principle.

Alaric Code:

 MATLAB:8> tic(); for k=1:1000 % ( flattens memalloc issues & al ) > for n = 1:length( startIndex ) > max( myArray( startIndex(n):endIndex() ) ); > end; > end; toc() Elapsed time is 0.2344 seconds. %% time is 0.0002 seconds per k-for-loop <--[ ref.^ remarks on testing ] 

Dan Code:

 MATLAB:9> tic(); for k=1:1000 > s_ind( size( myLogicals ) ) = 0; > s_ind( startIndex ) = 1; > labelled = cumsum( s_ind(:) ).*myLogicals(:); > result = accumarray( labelled + 1, myArray(:), [], @max ); > end; toc() error: product: nonconformant arguments (op1 is 43x1, op2 is 45x1) %% %% [Work in progress] to find my mistake -- sorry for not being able to reproduce %% Dan code and to make it work %% %% Both myArray and myLogicals shape was correct ( 9 x 5 ) 
0
source

Source: https://habr.com/ru/post/1203627/


All Articles