Why is my Matlab for-loop code faster than my vectorized version

I have always heard that vectorized code is faster than for loops in MATLAB. However, when I tried to vectorize my MATLAB code, it seemed to work slower.

I used tic and toc to measure time. I only changed the implementation of one function in my program. My vectorized version ran in 47.228801 seconds, and my for-loop version ran in 16.962089 seconds.

Also in my main program, I used a large number for N, N = 1000000 and a DataSet size of 1,301, and I ran each version several times for different data sets with the same size and N.

Why is the vector so much slower and how can I improve the speed further?

"vectorized" version

 function [RNGSet] = RNGAnal(N,DataSet) %Creates a random number generated set of numbers to check accuracy overall % This function will produce random numbers and normalize a new Data set % that is derived from an old data set by multiply random numbers and % then dividing by N/2 randData = randint(N,length(DataSet)); tempData = repmat(DataSet,N,1); RNGSet = randData .* tempData; RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N end 

Loop version

 function [RNGData] = RNGAnsys(N,Data) %RNGAnsys This function produces statistical RNG data using a for loop % This function will produce RNGData that will be used to plot on another % plot that possesses the actual data multData = zeros(N,length(Data)); for i = 1:length(Data) photAbs = randint(N,1); % Create N number of random 0 or 1's multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers end sumData = sum(multData,1); % sum each individual energy level data point RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2 end 
+6
source share
3 answers

Vectorization

A first look at the for-loop code tells us that since photAbs is a binary array, each of which scales according to each Data element, this binary function can be used for vectorization. This is abuse in code here -

 function RNGData = RNGAnsys_vect1(N,Data) %// Get the 2D Matrix of random ones and zeros photAbsAll = randint(N,numel(Data)); %// Take care of multData internally by summing along the columns of the %// binary 2D matrix and then multiply each element of it with each scalar %// taken from Data by performing elementwise multiplication sumData = Data.*sum(photAbsAll,1); %// Divide by n, but account for 0.5 average by n/2 RNGData = (sumData./(N/2))'; %//' return; 

After profiling, it seems that the bottleneck is the random component of the binary array. Thus, using the faster random binary array creator proposed in this smart solution , the above function can be optimized in the same way -

 function RNGData = RNGAnsys_vect2(N,Data) %// Create a random binary array and sum along the columns on the fly to %// save on any variable space that would be required otherwise. %// Also perform the elementwise multiplication as discussed before. sumData = Data.*sum(rand(N,numel(Data))<0.5,1); %// Divide by n, but account for 0.5 average by n/2 RNGData = (sumData./(N/2))'; %//' return; 

Using the creator of an intelligent binary arbitrary array, the source code can also be optimized, which will be used for a fair comparative analysis between cycle-optimized and vectorized codes later. Here is the code optimized for the loop -

 function RNGData = RNGAnsys_opt1(N,Data) multData = zeros(N,numel(Data)); for i = 1:numel(Data) %// Create N number of random 0 or 1 using a smart approach %// Then, multiply each element in the molar data by the random numbers multData(:,i) = Data(i) * rand(N,1)<.5; end sumData = sum(multData,1); % sum each individual energy level data point RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2 return; 

Benchmarking

Benchmarking code

 N = 15000; %// Kept at this value as it going out of memory with higher N's. %// Size of dataset is more important anyway as that decides how %// well is vectorized code against a for-loop code DS_arr = [50 100 200 500 800 1500 5000]; %// Dataset sizes timeall = zeros(2,numel(DS_arr)); for k1 = 1:numel(DS_arr) DS = DS_arr(k1); Data = rand(1,DS); f = @() RNGAnsys_opt1(N,Data);%// Optimized for-loop code timeall(1,k1) = timeit(f); clear f f = @() RNGAnsys_vect2(N,Data);%// Vectorized Code timeall(2,k1) = timeit(f); clear f end %// Display benchmark results figure,hold on, grid on plot(DS_arr,timeall(1,:),'-ro') plot(DS_arr,timeall(2,:),'-kx') legend('Optimized for-loop code','Vectorized code') xlabel('Dataset size ->'),ylabel('Time(sec) ->') avg_speedup = mean(timeall(1,:)./timeall(2,:)) title(['Average Speedup with vectorized code = ' num2str(avg_speedup) 'x']) 

results

enter image description here

Concluding observations

Based on the experience that I have had so far with MATLAB , neither for loops nor for vectorized methods are suitable for all situations, but it all depends on the specific situation.

+4
source

Try using the matlab profiler to determine which line or lines of code use the most time. That way, you can find out if the repmat function is what slows you down as suggested. Let us know what you find, I'm curious!

+2
source

randData = randint (N, length (DataSet));

allocates an array of 1.2 GB. (4 * 301 * 1,000,000). Implicitly, you create up to 4 of these monsters in your program, which leads to continuous cache misses.

The for-loop code can almost run in the processor cache (or it runs on large xeon).

0
source

Source: https://habr.com/ru/post/973109/


All Articles