Sample Data in MATLAB

I have two pieces of data. One of them is the actual fulldata , which is a 49625x6 numerical data set, and the other is the index of this data with the target class named Book2 , which is 49625x1.

Book2 has six names (lines) repeating over and over to match the records in the fulldata dataset. I want to take 1000 samples from a fuldate, of which 25% of 1000 samples are “blue” and 75% are “red” using Book2, then they contain this in a new subsample called sampledata .

How can I achieve this in MATLAB?

Pseudocode:

Choose 250 blue samples from Book2, don’t know how to “choose” 250 random “blue” samples bluesample = indX(Book2, :) or Book2(indX, :) not sure.

Select 750 red samples from Book2, again you don’t know how to "select" 750 random "red" samples redsample = indX(Book2, ;) or Book2(indX, :) again are not sure.

Combine the blue and red patterns into a subsample.

 subsample = join(bluesample, redsample) 

Find the subsample indices and create sampledata from fulldata:

 sampledata = subsample(indX(fulldata), :) This line is probably wrong 

This is an image of two data sets:

Enter image description here

Each line in book 2 corresponds to a line in fulldata. I am trying to achieve the ability to select a certain amount of "normal" and a certain amount of "abnormal" (yes, I know that they are not exactly named) data from fulldata using Book2, since Book2 is a full index and contains class labels.

So, in terms of my dataset, it’s easier to say:

 Choose 250 random samples of the string "normal." from Book2 and log the row number. Choose 750 random samples of the string "not normal." from Book2 and log the row number. Combine the two random samples of row numbers together. Make a new dataset (1000x6) using the combined row numbers (above) of fulldata. 
+4
source share
1 answer

Extract the "normal" entries using strmatch:

 normIdx = strmatch('normal.', Book2); normalSubset = fulldata(normIdx, :); 

Then, to generate a list of 250 random non-repeating integers, I looked for a “list of non-listed random integers” in Matlab and from the first result :

 p = randperm(size(normalSubset , 1)); p = p(1:250)-1; 

So now, to get your 250 randomly selected normal entries

 normalSample = normalSubset (p, :); 

normalSample will be 250 x 6. Now do the same with "not normal". to get notNormalSample (750 x 6) and then combine to get

 sample = [normalSample ; notNormalSample ] 

So, in sample all normals will appear before non-normals, if you want to mix them again, use randperm() :

 sample = sample(randperm(1000), :); 
+1
source

Source: https://habr.com/ru/post/1445681/


All Articles