Prompt for April 24

Linear Algebra: How do matrices work? Do they work on coordinates?? Let's find out!

Nicky Case (creator of a ton of cool demos and the Coming Out Simulator) has a great interactive demo for transformation matrices.

Refer back to our notes on linear regression.

Now making use of a simple Javascript library for principal components analysis

Simple Linear Regression:
Transforming a Plot
X Y X1 X2 Y1 Y2 = X' Y' Rotation: 0 Scale: 0

Rotate:

Scale:

Visualizing Principal Components
Covariance X Y X Y X1 X2 Y1 Y2 Eigenvalues X Y X Y X1 0 0 Y2

In Principlal Components Analysis, we use a covariance matrix. Values on the diagonal show the variance for an individual data attribute, and other elements show attribute-attribute relationships (and are related to correlation).

We use singular value decomposition to identify the eigenvalues for the covariance matrix. It gives us a matrix with eigenvalues along the diagonal. In effect, this lets us pull out relationships between variables piece by piece and construct a new coordinate system. The zeroes in the eigenvalues matrix suggest that each eigenvector is independent of the others (i.e. they are at right angles when you chart them), capturing different relationships in the data. We can then multiply matrices to get the eigenvectors, which traditionally are normalized so they have a length of 1. These are the principal components.

X Y % Var. 0.987 = X Y PC1 X Y % Var. 0.987 = X Y PC2

Let's chart the eigenvectors over our original points, then apply the eigenvectors to our data and see how they change
Remember that we need to center the points (subtract means from point positions) to make use of the eigenvectors.

X Y PC1 PC2
Complex Example

Let's do a PCA for a dataset of Olympic decathletes. Can we identify major factors that drive athlete performance across a bunch of different athletic events? PCA will let us examine relationships across different sports all at once. The data appear below in raw form, but for the PCA we will scale them all using the mean/std so that big values don't distort the components.

Charting by PC1 and PC2:

(Yes, this chart has lots of bad label positions.)

Code for today: