Matlab: How can I split my data matrix into two random subsets of column vectors while retaining label information?

Question

Matlab: How can I split my data matrix into two random subsets of column vectors while retaining label information?

I have an X (60x208) data matrix and a Y (1x208) label matrix. I want to split my X data matrix into two random subsets of column vectors: training (which will be 70% of the data) and testing (which will be 30% of the data), but I still need to determine which label from Y corresponds to each column vector. I could not find any function for this, any ideas?

EDIT: I think I should add, in Y: 1 and 2 there are only two labels (not sure if it matters)

+5

sample matlab machine-learning label

user3457834 Oct 30 '14 at 19:57

source share

1 answer

rayryeng · Accepted Answer · 2014-10-30T20:22:16+0000

This is pretty easy to do. Use randperm to generate a random permutation of the indices from 1 to the number of points you have ... which is 208 in your case.

After creating this sequence, simply use this and a subset in your X and Y to extract the training and testing data and labels. As such, do something like this:

 num_points = size(X,2); split_point = round(num_points*0.7); seq = randperm(num_points); X_train = X(:,seq(1:split_point)); Y_train = Y(seq(1:split_point)); X_test = X(:,seq(split_point+1:end)); Y_test = Y(seq(split_point+1:end));

split_point determines how many points we need to place in our training set, and we will need to round it if this calculation gives any decimal points. I also did not have hard code 208 because your dataset could grow, and therefore it will work with any dataset of size you choose. X_train and Y_train will contain your data and labels for your training set, and X_test and Y_test will contain your data and labels for your test set.

So the first X_train column is your data point for the first element of your workout set, with the first Y_train element Y_train as a label for that particular point ... and so on and so forth!

Matlab: How can I split my data matrix into two random subsets of column vectors while retaining label information?

More articles: