Ideas for vectorization and optimization
One approach that can be used to vectorize this problem would be to convert the subsets into blocks with the correct shape, and then search for the maximum of elements from these blocks at a time. Now, converting to regular shaped blocks has one problem here, and that is, the subsets are uneven in length. To avoid this problem, you can create a 2D index matrix, starting from each of the starts elements and continuing to the maximum length of the subset. The good thing about this is that it allows vectorization, but at the expense of more memory requirements, which will depend on the variation in the length of the subsets.
Another problem with this vectorization method would be that it could potentially lead to the creation of indexes outside the finite subsets. To avoid this, you can think of two possible ways -
Use a larger input array, expanding the input array so that the maximum number of subsets plus starting indices are still within the limits of the extended array.
Use the original input array for starters until we are within the original input array, and then use the loop source code for the rest of the subsets. We can call it mixed programming just for the sake of a short name. This will save us from memory requirements when creating an extended array, as discussed earlier in another approach.
Listed below are these two ways / approaches.
Approach # 1: Vectorized Technique
[m,n] = size(myArray); %// store no. of rows and columns in input array intv = ends-starts; %// intervals max_intv = max(intv); %// max interval max_intv_arr = [0:max_intv]'; %//'
Approach # 2: Mixed Programming
%// PART - I: Vectorized technique for subsets that when normalized %// with max extents still lie within limits of input array intv = ends-starts; %// intervals max_intv = max(intv); %// max interval %// Find the last subset that when extended by max interval would still %// lie within the limits of input array starts_extent = find(starts+max_intv<=numel(myArray),1,'last'); max_intv_arr = [0:max_intv]'; %//'
Benchmarking
In this section, we compare the two approaches and the source code of the loop against each other for performance. Let the installation codes before starting the actual benchmarking -
N = 10000; %// No. of subsets M1 = 1510; %// No. of rows in input array M2 = 2185; %// No. of cols in input array myArray = rand(M1,M2); %// Input array num_runs = 50; %// no. of runs for each method %// Form the starts and ends by getting a sorted random integers array from %// 1 to one minus no. of elements in input array. That minus one is %// compensated later on into ends because we don't want any subset with %// starts and ends as the same index y1 = reshape(sort(randi(numel(myArray)-1,1,2*N)),2,[]); starts = y1(1,:); ends = y1(1,:)+1; %// Remove identical starts elements invalid = [false any(diff(starts,[],2)==0,1)]; starts = starts(~invalid); ends = ends(~invalid); %// Create myLogicals myLogicals = false(size(myArray)); for k1=1:numel(starts) myLogicals(starts(k1):ends(k1))=1; end clear invalid y1 k1 M1 M2 N %// clear unnecessary variables %// Warm up tic/toc. for k = 1:100 tic(); elapsed = toc(); end
Now, the placebo codes that give us lead time are
disp('---------------------- With Original loop code') tic for iter = 1:num_runs %// ...... approach
Test results
In your comments, you mentioned the use of 1510 x 2185 matrix , so there are two cases with such sizes and subsets of sizes 10000 and 2000 .
Case 1 [Input - matrix 1510 x 2185, subsets - 10000]
---------------------- With Original loop code Elapsed time is 15.625212 seconds. ---------------------- With Approach #1 Elapsed time is 12.102567 seconds. ---------------------- With Approach #2 Elapsed time is 0.983978 seconds.
Case 2 [Input - 1510 x 2185 matrix, subsets - 2000]
---------------------- With Original loop code Elapsed time is 3.045402 seconds. ---------------------- With Approach #1 Elapsed time is 11.349107 seconds. ---------------------- With Approach #2 Elapsed time is 0.214744 seconds.
Case 3 [Large input - 3000 x 3000 matrix, subsets - 20,000]
---------------------- With Original loop code Elapsed time is 12.388061 seconds. ---------------------- With Approach #1 Elapsed time is 12.545292 seconds. ---------------------- With Approach #2 Elapsed time is 0.782096 seconds.
Note that the number of num_runs runs num_runs been changed to keep the execution time of the fastest approach close to 1 sec .
Conclusion
So, I think mixed programming (Approach No. 2) is the way to go! For future work, you can use standard deviation in the scattering criteria if the performance suffers due to the scattering and unloading of the work for most scattered subsets (along their length) into the loop code.