Thursday, 11 May 2017

PCA - Post 2

 Although I had mentioned about the number of constraint equations based on the row rank of the matrix , when it comes to real life scenario we don't know the actual rank of the matrix since the data is always subject to errors, hence although the data is obtained from an experiment constrained by 3 equations , the row rank of the data matrix will most probably be equal to the number of rows due to errors in measurements , thereby mathematically we can say that it isn't constrained by nay equations when it is actually not the case.

 But heuristically we can actually find the number of constraints approximately by many methods .

First we will look at a different approach to derive the same constraint matrix using eigen value decomposition .

A Different Approach

$Z = Z_{m*n}$
$ZZ^{'} = (USV^{'} )*(VS^{'}U^{'} )$
$\implies ZZ^{'} = (USS^{'}U^{'} )$
Hence , to obtain the constraint matrix , only the U matrix is necessary .
So the new Matlab code : 
[U, Sdash] = eig($ZZ^{'}$)  , where Sdash = $SS^{'}$

Here U, Sdash are the eigenvectors , eigenvalues respectively of the matrix $ZZ^{'}$  . 
Hence it can clearly be seen that the singular values of the data matrix A is equal to the square root of the eigenvalue of the matrix $ZZ^{'}$ .  

The eigenvalues actually give a sense of the statistical variance of the equations whose coefficients are given by the U (eigenvectors) .  For example let $U_{1}$ be the coefficients of an equation , then the statistical variance of that equation  is given by $U_{1}^{'} \Sigma U_{1} $ ,where $\Sigma $ is the covariance matrix across the variables/features  .

Before we actually prove that the eigenvalues of the $ZZ^{'}$ (matrix same as covariance matrix differing by a scale)  actually give the variance , we shall for now accept it as a fact and move forward to predict the number of constraints using it .We can now by using the eigenvalues approximately identify the number of constraints ,how do we do that ?
Let us take an example where in an equation whose coefficients are given by $U_m$ , have an eigenvalue = $10^{-3}$ ,then what this means is that the statistical variance of that equation  $\approx 0$ .
Using the same principle, some methods (although very loose methods) are introduced ,rigorous methods will be introduced in the following posts.

Preliminary methods 

  1. SCREE Plot
    This basically is a plot of the eigen values of the covariance matrix (which can differ by a scale)  . This plot basically gives you a visual knee , which can be then used as a heuristic to find the number of constraints . Hence heuristically we can see that the number of constraints is approximately equal to 7. 
  2. Image Courtesy : http://ba-finance-2013.blogspot.in/2012/09/scree-plots-interpretation-and.html

  3. Variance captured 
     At times people tend to also use the amount of variance captured  , to find the number of constraints .  the formula for the variance captured is given by:
     $ VarCapt = \frac{\sum _ {i =1} ^{u} e_{i}}{\sum _ {i =1} ^{m} e_{i}}$,where $ u \leq m $
     Here m is the same as the row size of the data matrix introduced in the previous blog. 
  4. Here the denominator is the sum of all eigenvalues , the numerator is the sum of the u largest eigenvalues .
    Hence keeping some threshold say 95% for the variance captured (VarCapt) we can find the value of u , which is nothing but a measure of the row rank of the data matrix.  

In the above methods I had briefly introduced some very basic methods to approximately find the number of constraints , without giving proof of how eigenvalues of the covariance matrix actually gives the sense of the statistical variance of the equation whose coefficients are given by the corresponding eigenvector . So let us actually look at the proof :

Proof 

Another way of looking at PCA , is to find a set of unit orthogonal vectors acting as coefficients of an equation to give maximum variances . Hence by posing this as an optimization problem we get 

max $(w^{'} \Sigma w) $ ,such that $w'w = 1$ ,where $w$ is the coefficient vector of the equation , $\Sigma$ the covariance matrix.

Now setting up this as a lagrange function we get 
$L = (w^{'}  \Sigma w)  - \lambda(w'w - 1) $ , 
Hence the new optimization problem is max ($L$)

So , now differentiating L we get  

$\frac{\delta L}{\delta w} = 0 = 2 \Sigma w - 2\lambda(w) $
$\implies  \Sigma w = \lambda(w) $  (ie w , $\lambda$ are the corresponding eigenvectors ,eigenvalues of the covariance matrix . 

$\implies  w^{'} \Sigma w = \lambda(w^{'}w) $
But,  
$\frac{\delta L}{\delta \lambda} = 0 = w'w - 1$
$\implies w'w  =  1$

Hence,
$ w^{'} \Sigma w = \lambda $  (proved) 

No comments:

Post a Comment