In your code, you create a static test test. If you want to select the best depth by cross-checking, you can use sklearn.cross_validation.cross_val_score inside the for loop.
You can read the sklearn documentation for more information.
Here is the update of your code from CV:
import numpy as np import pandas as pd from sklearn import tree from sklearn.cross_validation import cross_val_score from pprint import pprint features = ["fLength", "fWidth", "fSize", "fConc", "fConc1", "fAsym", "fM3Long", "fM3Trans", "fAlpha", "fDist", "class"] df = pd.read_csv('magic04.data',header=None,names=features) df['class'] = df['class'].map({'g':0,'h':1}) x = df[features[:-1]] y = df['class']
Alternatively, you can use sklearn.grid_search.GridSearchCV and not write a for loop yourself, especially if you want to optimize more than one hyperparameter.
import numpy as np import pandas as pd from sklearn import tree from sklearn.model_selection import GridSearchCV features = ["fLength", "fWidth", "fSize", "fConc", "fConc1", "fAsym", "fM3Long", "fM3Trans", "fAlpha", "fDist", "class"] df = pd.read_csv('magic04.data',header=None,names=features) df['class'] = df['class'].map({'g':0,'h':1}) x = df[features[:-1]] y = df['class'] parameters = {'max_depth':range(3,20)} clf = GridSearchCV(tree.DecisionTreeClassifier(), parameters, n_jobs=4) clf.fit(X=x, y=y) tree_model = clf.best_estimator_ print (clf.best_score_, clf.best_params_)
Edit: The way GridSearchCV is imported to post the learn2day comment has changed.