I have a dataset that looks like this:
Out Revolver Ratio Num ...
0 1 0.766127 0.802982 0 ...
1 0 0.957151 0.121876 1
2 0 0.658180 0.085113 0
3 0 0.233810 0.036050 3
4 1 0.907239 0.024926 5
...
Outcan only take values 0 and 1. Then I tried to create PCA and LCA plots using the following code: http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_vs_lda.html
features = Train.columns[1:]
Xf = newTrain[features]
yf = newTrain.Out
pca = PCA(n_components=2)
X_r = pca.fit(Xf).transform(Xf)
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(Xf, yf).transform(Xf)
plt.figure()
for c, i, name in zip("rgb", [0, 1], names):
plt.scatter(X_r[yf == i, 0], X_r[yf == i, 1], c=c, label=name)
plt.legend()
plt.title('PCA plt')
plt.figure()
for c, i, name in zip("rgb", [0, 1], names):
plt.scatter(X_r2[yf == i, 0], X_r2[yf == i, 1], c=c, label=name)
plt.legend()
plt.title('LDA plt')
I can make the PCA schedule work. However, this does not make sense, since it shows only 2 points. One in the area (-4000, 30), and the other (2400, 23.7). I do not see a lot of data, as in the plot in this link
LDA job does not work and gives an error
IndexError: index 1 is out of bounds for axis 1 with size 1
I also tried the code below to create an LDA chart, but got the same error
for c, i, name in zip("rgb", [0, 1], names):
plt.scatter(x=X_LDA_sklearn[:, 0][yf==i], y=X_LDA_sklearn[:, 1][yf==i], c=c, label=name)
plt.legend()
Does anyone know what's wrong with that?
EDIT: Here is my import
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.lda import LDA
Like where errors occur:
I get
FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
plt.scatter(X_r[yf == i,0], X_r[yf == i, 1], c=c, label=name)
for PCA
LDA
plt.scatter(X_r2[yf == i, 0], X_r2[yf == i, 1], c=c, label=name)
FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
IndexError: index 1 is out of bounds for axis 1 with size 1