The PCA consists of finding a set of independent random variables with which you can represent your data, sorted in descending order with respect to the value that they store.
These variables can be found by projecting your data points onto a specific orthogonal subspace. If your (medium-term) data matrix is โโX, then this subspace consists of eigenvectors X ^ T X.
When X is large, say about the dimensions of nxd, you can calculate X ^ TX by calculating the outer product of each row of the matrix yourself, and then adding all the results up. This, of course, lends itself to a simple map reduction procedure if d is small, regardless of how large n is. This is because the external product of each row is itself a dxd matrix, which every worker must process in main memory. This is why you have had difficulty processing many columns.
If the number of columns is large (and the number of rows is not so much), you can really calculate the PCA. Just calculate the SVD of your (mid-center) transposed data matrix and multiply it by the resulting eigenvectors and the inverse diagonal eigenvalue matrix. There is your orthogonal subspace.
Bottom line: if the spark.ml implementation follows the first approach every time, then the restriction should be the same. If they check the size of the input dataset to decide if they should go for the second approach, then you will not have problems with a large number of columns if the number of rows is small.
Regardless of this, the restriction is imposed by how much memory your employees have, so perhaps they allow users to hit the ceiling themselves, and not offer a restriction that may not apply to some. This may be the reason they decided not to mention the restriction in the new documents.
Update: The source code shows that they perform the first approach each time, regardless of the dimension of the input. The actual limit is 65535, and at 10000, a warning.