How to accumulate data sets?

Question

How to accumulate data sets?

I have a vector with values between 1 and N > 1 . Some COULD values occur several times in a row. Now I want to have a second line that counts consecutive records and deletes all subsequent records, for example:

 A = [1 2 1 1 3 2 4 4 1 1 1 2]'

will result in:

 B = [1 1; 2 1; 1 2; 3 1; 2 1; 4 2; 1 3; 2 1]

(you see, the second column contains the number of sequential records! Recently, I met accumarray() in MATLAB, but I can not find any solution with it for this task, since it always considers the entire vector, and not just sequential records.

Any idea?

+4

vector count matlab accumulate

tim Jan 20 '12 at 12:36

source share

2 answers

I do not see another way, and then iterate over the data set, but it is pretty straight forward. This may not be the most elegant solution, but as far as I can see, it works fine.

 function B = accum_data_set(A) prev = A(1); count = 1; B = []; for i=2:length(A) if (prev == A(i)) count = count + 1; else B = [B;prev count]; count = 1; end prev = A(i); end B = [B;prev count];

output:

 >> A = [1 2 1 1 3 2 4 4 1 1 1 2]'; >> B = accum_data_set(A) B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1

+2

Lucas Jan 20 '12 at 13:39

source share

Bill cheatham · Accepted Answer · 2012-01-20T13:49:27+0000

This is probably not the most readable or elegant way to do this, but if you have large vectors and speed - a problem, this vectorization may help ...

 A = [1 2 1 1 3 2 4 4 1 1 1 2];

First, I'm going to overlay A with a start and end zero to capture the first and final transitions

 >> A = [0, A, 0];

The location of the transitions can be found where the difference between adjacent values is not equal to zero:

 >> locations = find(diff(A)~=0);

But since we added the beginning of A with zero, the first transition is pointless, so we only take places from 2: end. The values in are the values of each segment:

 >> first_column = A(locations(2:end)) ans = 1 2 1 3 2 4 1 2

This is the first stake - now find the score of each number. This can be found from the difference in locations. This is where it is important to fill A at both ends:

 >> second_column = diff(locations) ans = 1 1 2 1 1 2 3 1

Finally, combining:

 B = [first_column', second_column'] B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1

All this can be combined into one less readable line:

 >> A = [1 2 1 1 3 2 4 4 1 1 1 2]'; >> B = [A(find(diff([A; 0]) ~= 0)), diff(find(diff([0; A; 0])))] B = 1 1 2 1 1 2 3 1 2 1 4 2 1 3 2 1

How to accumulate data sets?

More articles: