How to accumulate data sets?

I have a vector with values ​​between 1 and N > 1 . Some COULD values ​​occur several times in a row. Now I want to have a second line that counts consecutive records and deletes all subsequent records, for example:

 A = [1 2 1 1 3 2 4 4 1 1 1 2]' 

will result in:

 B = [1 1; 2 1; 1 2; 3 1; 2 1; 4 2; 1 3; 2 1] 

(you see, the second column contains the number of sequential records! Recently, I met accumarray() in MATLAB, but I can not find any solution with it for this task, since it always considers the entire vector, and not just sequential records.

Any idea?

+4
source share
2 answers

This is probably not the most readable or elegant way to do this, but if you have large vectors and speed - a problem, this vectorization may help ...

 A = [1 2 1 1 3 2 4 4 1 1 1 2]; 

First, I'm going to overlay A with a start and end zero to capture the first and final transitions

 >> A = [0, A, 0]; 

The location of the transitions can be found where the difference between adjacent values ​​is not equal to zero:

 >> locations = find(diff(A)~=0); 

But since we added the beginning of A with zero, the first transition is pointless, so we only take places from 2: end. The values ​​in are the values ​​of each segment:

 >> first_column = A(locations(2:end)) ans = 1 2 1 3 2 4 1 2 

This is the first stake - now find the score of each number. This can be found from the difference in locations. This is where it is important to fill A at both ends:

 >> second_column = diff(locations) ans = 1 1 2 1 1 2 3 1 

Finally, combining:

 B = [first_column', second_column'] B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1 

All this can be combined into one less readable line:

 >> A = [1 2 1 1 3 2 4 4 1 1 1 2]'; >> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))] B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1 
+5
source

I do not see another way, and then iterate over the data set, but it is pretty straight forward. This may not be the most elegant solution, but as far as I can see, it works fine.

 function B = accum_data_set(A) prev = A(1); count = 1; B = []; for i=2:length(A) if (prev == A(i)) count = count + 1; else B = [B;prev count]; count = 1; end prev = A(i); end B = [B;prev count]; 

output:

 >> A = [1 2 1 1 3 2 4 4 1 1 1 2]'; >> B = accum_data_set(A) B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1 
+2
source

Source: https://habr.com/ru/post/1392040/


All Articles