UPDATE: matplotlib.mlab.PCA has been matplotlib.mlab.PCA since release 2.2 (2018-03-06).
The matplotlib.mlab.PCA library (used in this answer ) is not deprecated. Therefore, for all people arriving here via Google, I will post a complete working example tested with Python 2.7.
Use the following code with caution, as it uses the now obsolete library!
from matplotlib.mlab import PCA import numpy data = numpy.array( [[3,2,5], [-2,1,6], [-1,0,4], [4,3,4], [10,-5,-6]] ) pca = PCA(data)
Now in "pca.Y" is the original data matrix in terms of the base vectors of the main components. More information about the PCA can be found here .
>>> pca.Y array([[ 0.67629162, -0.49384752, 0.14489202], [ 1.26314784, 0.60164795, 0.02858026], [ 0.64937611, 0.69057287, -0.06833576], [ 0.60697227, -0.90088738, -0.11194732], [-3.19578784, 0.10251408, 0.00681079]])
You can use matplotlib.pyplot to draw this data to make sure the PCA gives "good" results. The names list is only used to annotate our five vectors.
import matplotlib.pyplot names = [ "A", "B", "C", "D", "E" ] matplotlib.pyplot.scatter(pca.Y[:,0], pca.Y[:,1]) for label, x, y in zip(names, pca.Y[:,0], pca.Y[:,1]): matplotlib.pyplot.annotate( label, xy=(x, y), xytext=(-2, 2), textcoords='offset points', ha='right', va='bottom' ) matplotlib.pyplot.show()
Looking at our original vectors, we see that the data [0] ("A") and the data [3] ("D") are quite similar, like the data [1] ("B") and the data [2] (" C "). This is reflected in the 2D plot of the data converted to PCA.
