Principle Components Analysis

Um, what am i looking at?

This is a simple visualization of PCA, which tries to find the direction of maximum spreads of our data.

Just tap on the left-most grid(a.k.a Data Space) to add a new data points ( you can also drag them! ) and see how your principle components(eigen vectors) and covariances(visualized as blueish green ellipse) are changing.

If things gets slower, simply switch off the Live-Update and use Recalculate button instead.

But, What is Principle Components Analysis?…

As mentioned earlier, Principle Components Analysis tries to find the direction of maximum spreads (a.k.a variance) of our data. To this end, we first need to calculate the covariance matrix of our standardized data then we can simply perform eigendecomposition onto them.

the eigendecoposition spits out 2 things, eigenvectors and eigenvalues. as it is usually being defined in the literature, eigenvectors of our covariance matrix essentially gives us the directions and magnitude of maximum variances in dessending order. i.e, the first eigenvector is the direction of heighest spread of our data which is exactly what we want! also, eigenvalues characterizes the magnitude of our spread.

By the way, the eigenvectors of our covariance matrix is also know as principle components of our data. once we’ve calculated them, we can reduce the dimensionaly of our data by projecting our data (which is 2d-dim in our case) into one of the principle components, effectively explaining 50% of the variance in our data.

We can also employ this exact same technique to reduce the dimensionaly of data in real world for e.g. the image below is 2d representation of MNIST dataset inwhich each image is of 784-dimension.

Further reading

Yea i know, its alot to take in, heck there is also a pretty famous meme for it:

but if you want to learn more, there are plenty of great resources out there. some of which includes:

Wiki: Principal Components Analysis

StatQuest: PCA

Support Vector Machine »