This is my first machine learning brush, so I'm trying to figure out how it all works. I have a data set where I collected all the statistics of each player to play with my baseball team in high school. I also have a list of all the players who have ever done this in MLB from my high school. What I would like to do is to split the data into a training set and a test set, and then submit it to some algorithm in the scikit-learn package and predict the probability of creating an MLB.
So, I looked through a number of sources and found this cheat sheet, which suggests starting with linear SVC.

, , , , , ( , , yada, yada), X_train; , 1 ( MLB) 0 ( MLB), Y_train. Fit (X, Y), pred (X_test), , Y_test.
, ?
EDIT, :
20 , , , , .. ; - , .
10 , ; , , , 1% MLB.