Revision

Back to Machine Learning

Definition

PCA seeks to find a new basis to represent a dataset. In general this new basis dimension is lower than the original dimension of the dataset.

The goal of PCA is then to represent the maximum of the variance of the original data in a lower dimension.

Formulation

Let’s define a dataset X where each \(X^{(j)}\) is represented in \(n\) dimension: \(X^{(j)}=(X_i^{(j)})_{i \in [1, n]}\).

Let’s normalise each element of X to have mean 0 and variance 1 for each feature of X.

The goal of PCA is to find a new basis of unit vectors \(u=(u_k)_{k \in [1, d]}\) (\(\Vert u_k \Vert_2^2=1\)), such that the projection of X on this new basis retains the maximum variance.

As the elements of X are centered, the variance of an \(X^{(j)}\) is defined by its distance to the origin (squared). The variance for the projected \(X^{(j)}\) is also defined by its distance to the origin (as \(X^{(j)}\) is just multiplied by a scalar).

The projection of \(X^{(j)}\) on the axis \(u_k\) is \((X^{(j)})^T u_k\) and its distance to origin given the axis \(u_k\) is thus \((X^{(j)})^T u_k\) (as \(u_k\) is a unit vector). So the maximisation problem is:

\[\begin{eqnarray} \frac{1}{n_{pop}}\sum_{j=1}^{n_{pop}}\left((X^{(j)})^T u_k\right)^2 &&= \frac{1}{n_{pop}}\sum_{j=1}^{n_{pop}}u_k^T X^{(j)} (X^{(j)})^T u_k \\ &&= u_k^T \left(\frac{1}{n_{pop}}\sum_{j=1}^{n_{pop}} X^{(j)} (X^{(j)})^T\right) u_k \\ && \text{subject to } \Vert u_k \Vert_2^2=1 \end{eqnarray}\]

First remark that \(\frac{1}{n_{pop}}\sum_{j=1}^{n_{pop}} X^{(j)} (X^{(j)})^T\) is the covariance matrix of \(X\). Let’s call it \(\Sigma\). The maximum of this function is obtained using its Lagrangian and setting the derivatives to 0.

\[L(u_k)=u_k^T \Sigma u_k + \lambda (\Vert u_k \Vert_2^2-1)\]

\[\begin{eqnarray} \frac{d}{d u_k}L(u_k) &&= 2 \Sigma u_k + 2 \lambda u_k = 0 \\ &&= \Sigma u_k + 2 \lambda u_k = 0 \end{eqnarray}\]

We recognize the eigenvector formulation.

Hence PCA in \(d\) dimension is equivalent to looking for the \(d\)-th eigenvector with highest eigenvalue of the covariance matrix of X.

Then we project X on the new basis as follow:

\[X'=u^TX\]

Where:

\(u\) is a \(n \times d\) matrix,
\(X\) is a \(n_{pop} \times n\) matrix.

So \(u^TX\) gives a \(n_{pop} \times d\) matrix.

Resource

See:

CS229 course on PCA.