The problem with basic component analysis

I'm not sure if this is the right place, but here I go:

I have a database of 300 high resolution images. I want to calculate the PCA in this database, and so far this is what I am doing: - change each image as one column vector - create a matrix of all my data (500x300) - calculate the middle column and subtract it into my matrix, this gives me X - calculate the correlation C = X'X (300x300) - find the eigenvectors V and the eigenvalues ​​D for C. - the PCA matrix is ​​given by XV * D ^ -1/2, where each column is a Principal Component

This is great and gives me the right component.

Now what I am doing is doing the same PCA in the same database, except that the images have lower resolution.

enter image description here

Here are my results, low resolution on the left and high resolution on the right. You see that most of them are similar, but SOME images do not match (the ones that I circled)

Is there any way to explain this? I need my algorithm to have the same images, but one of them was set in high resolution and the other in low resolution, how can I do this?

thanks

+6
source share
1 answer

It is very possible that the filter you used could do something for some components. In the end, lower resolution images do not contain higher frequencies, which also contribute to what components you are going to get. If the component weights (lambdas) in these images are small, there is also a good chance of errors.

I assume your component images are sorted by weight. If so, I will try to use another pre-downsampling filter and see if it gives different results (essentially, get low-resolution images in different ways). It is possible that components that come out differently have a lot of frequency content in the transition band of this filter. It appears that the images circled in red are almost perfect inversions of each other. Filters can cause such things.

If your images are not sorted by weight, I won’t be surprised if the ones you circled have very little weight, and this might just be a computational error or something like that. In any case, we probably need a little more information about how you reduce the size, how to sort the images before displaying them. In addition, I would not expect all images to be very similar, because you essentially get rid of several frequency components. I am sure that this would have nothing to do with the fact that you are stretching images into vectors to calculate PCA, but try to stretch them in the other direction (instead of columns instead of columns or vice versa) Try this. If it changes the result, you might want to try to run the PCA a little differently, not sure how to do it.

+2
source

Source: https://habr.com/ru/post/894920/


All Articles