I have a large data set (~ 1 million records) that are stored as a cell with the number of columns and many rows. My problem is that I need to identify records that occur at the same time, and then manipulate other columns to delete rows with duplicate dates without losing all the information.
An example of a subset of such data can be initialized as follows:
data = {'10:30', 100; '10:30', 110; '10:31', 115;'10:32', 110}
That is, I have a cell with one column of rows (representing time) and another column (a lot in real data) of doubling.
My code should notice repeating 10:30 (there may be many such repeats), and then you can take the corresponding doubling (100 and 110) as inputs for some function, f (100,110), and then remove the repeating row from the data.
those. if the function was, say, average, I should have a result that looks something like
data =
'10:30' [105]
'10:31' [115]
'10:32' [110]
It would be pretty simple if the loops were fast enough, but with my dataset it makes no sense to even try to solve the problem with a loop.
I got to
[uniqueElements, firstUniquePosition, commonSets] = unique(data(:,1));
after a multiple game that gives some information that seems useful,
uniqueElements =
'10:30'
'10:31'
'10:32'
firstUniquePosition =
1
3
4
commonSets =
1
1
2
3
but I can't figure out how to make a vector statement that allows me to manipulate elements with common dates.
, - cellfun, Matlab .