An old question, but I donโt think it was satisfactorily answered (and I just landed here myself through Google). I ended up in the same shoes and had to track down the answer myself.
The goal of the PCA is to represent your X data in an orthonormal basis of W; the coordinates of your data in this new basis are Z, as shown below:

Due to orthonormalization, we can invert W simply, transfer it, and write:

Now, to reduce the dimension, we choose a certain number of components k <n. Assuming that our basis vectors in W are ordered from largest to smallest (i.e., the first vector corresponding to the largest eigenvalue is the first, etc.), It just holds the first k columns of W.

Now we have a k-dimensional representation of our training data X. Now you start some controlled classifier using new functions in Z.

The key is to understand that W in a sense is a canonical transformation from our space of functions p up to the space of k attributes (or, at least, the best transformation that we could find using our training data). Thus, we can hit our test data with the same W transformation, resulting in a k-dimensional set of test capabilities:

Now we can use the same classifier trained in the k-dimensional representation of our training data to make predictions for the k-dimensional representation of our test data:

The point of this whole procedure is that you can have thousands of functions, but (1) not all of them will have a significant signal and (2) your controlled learning method may be too complicated to learn on a full set of functions (or it will take too much time, or your computer will not have enough memory to process the calculations). PCA can significantly reduce the number of functions required to present your data, not excluding the possibility of your data that really add value.