Linear Algebra: How do matrices work? Do they work on coordinates?? Let's find out!
Refer back to our notes on linear regression.
In Principlal Components Analysis, we use a covariance matrix. Values on the diagonal show the variance for an individual data attribute, and other elements show attribute-attribute relationships (and are related to correlation).
We use singular value decomposition to identify the eigenvalues for the covariance matrix. It gives us a matrix with eigenvalues along the diagonal. In effect, this lets us pull out relationships between variables piece by piece and construct a new coordinate system. The zeroes in the eigenvalues matrix suggest that each eigenvector is independent of the others (i.e. they are at right angles when you chart them), capturing different relationships in the data. We can then multiply matrices to get the eigenvectors, which traditionally are normalized so they have a length of 1. These are the principal components.
Let's chart the eigenvectors over our original points, then apply the eigenvectors to our data and see how they change
Remember that we need to center the points (subtract means from point positions) to make use of the eigenvectors.
Let's do a PCA for a dataset of Olympic decathletes. Can we identify major factors that drive athlete performance across a bunch of different athletic events? PCA will let us examine relationships across different sports all at once. The data appear below in raw form, but for the PCA we will scale them all using the mean/std so that big values don't distort the components.
Charting by PC1 and PC2: