Matlab Row Matrix Arrays: Find Duplicate Rows and Manipulate Related Data

I have a large data set (~ 1 million records) that are stored as a cell with the number of columns and many rows. My problem is that I need to identify records that occur at the same time, and then manipulate other columns to delete rows with duplicate dates without losing all the information.

An example of a subset of such data can be initialized as follows:

data = {'10:30', 100; '10:30', 110; '10:31', 115;'10:32', 110}

That is, I have a cell with one column of rows (representing time) and another column (a lot in real data) of doubling.

My code should notice repeating 10:30 (there may be many such repeats), and then you can take the corresponding doubling (100 and 110) as inputs for some function, f (100,110), and then remove the repeating row from the data.

those. if the function was, say, average, I should have a result that looks something like

data =
       '10:30' [105]
       '10:31' [115]
       '10:32' [110]

It would be pretty simple if the loops were fast enough, but with my dataset it makes no sense to even try to solve the problem with a loop.

I got to

[uniqueElements, firstUniquePosition, commonSets] = unique(data(:,1));

after a multiple game that gives some information that seems useful,

uniqueElements = 

    '10:30'
    '10:31'
    '10:32'

firstUniquePosition =

     1
     3
     4

commonSets =

     1
     1
     2
     3

but I can't figure out how to make a vector statement that allows me to manipulate elements with common dates.

, - cellfun, Matlab .

+4
2

accumarray:

[times,~,subs] = unique(data(:,1));
idx = 1:size(data,1);
meanOfCommonTimes = accumarray(subs(:),idx(:),[],@(x) mean( [data{x,2}] ))

output = [times num2cell(meanOfCommonTimes)]

output = 

    '10:30'    [105]
    '10:31'    [115]
    '10:32'    [110]

1 : datenum.

times = datenum(data(:,1),'hh:mm');

:

vals = cell2mat(data(:,2));

10 !

[~,~, subs] = unique(times);
meanOfCommonTimes = accumarray(subs(:),vals(:),[],@mean);

. . , .


Benchmark

function [t] = bench()
    data = {'10:30', 100; '10:30', 110; '10:31', 115;'10:32', 110};
    data = [repmat(data, 200000, 1)]; % I use a matrix rather than a cell array for the simplicity of randomly generating example data

    % functions to compare
    fcns = {
        @() thewaywewalk(data);
        @() Cecilia(data);
    };

    thewayw = timeit(fcns{1})
    Ceci = timeit(fcns{2})
end

function Z = Cecilia(data)
    [uniqueElements, ~, commonSets] = unique(data(:,1));

    num_unique = length(uniqueElements);
    Z = zeros(num_unique, 1);
    for ii = 1:num_unique
        Z(ii) = mean([data{commonSets==ii, 2}]);
    end
end
function Z = thewaywewalk(data)
    [~,~,subs] = unique(data(:,1));
    idx = 1:size(data,1);
    Z = accumarray(subs(:),idx(:),[],@(x) mean( [data{x,2}] ));
end

800000 .

thewayw =  1.1483
Ceci = 1.0957

, accumarray double , .

+5

, , . ,

data = {'10:30', 100; '10:30', 110; '10:31', 115;'10:32', 110};
[uniqueElements, firstUniquePosition, commonSets] = unique(data(:,1));

num_unique = length(uniqueElements);
mean_of_times = zeros(num_unique, 1);
for ii = 1:num_unique
    mean_of_times(ii) = mean([data{commonSets==ii, 2}]);
end

output = [uniqueElements num2cell(mean_of_times)]

output = 

    '10:30'    [105]
    '10:31'    [115]
    '10:32'    [110]

, for? 20000 100 , 2 000 000 . . , accumarray x.

Number of unique dates vs time

figure; hold on;
kk = 100; %Make 100 times as many rows as dates
for jj = 5000:5000:20000
    dates = 1:jj;
    times = rand(jj*kk, 1);
    % I use a matrix rather than a cell array for the simplicity of randomly generating example data
    data = [repmat(dates, 1, kk)' times];
    data = data(randperm(jj*kk), :); %Shuffle data rows

    [uniqueElements,~,commonSets] = unique(data(:,1));

    %thewaywewalk solution using accumarray
    tic;
    idx = 1:size(data,1);
    accumarray(commonSets(:),idx(:),[],@(x) mean( [data(x,2)] ));
    stopwatch = toc;
    plot(jj, stopwatch, 'b.'); 

    %my solution using a for loop
    tic;
    num_unique = length(uniqueElements);
    mean_of_times = zeros(num_unique, 1);
    for ii = 1:num_unique
        mean_of_times(ii) = mean([data(commonSets==ii, 2)]);
    end
    stopwatch = toc;
    plot(jj, stopwatch, 'r.'); 
end

1% . for . thewaywewalk 3 . accumarray for . ? accumarray.

+1

Source: https://habr.com/ru/post/1583844/


All Articles