Friday, 12 May 2017

PCA - Post 3

The previous two posts would have given you an intuitive understanding of the mathematics involved in PCA . This post is going to present to you dimensionality reduction aka Data Compression.
Before we learn about Data compression ,we need to discuss few terminologies

Few Terminologies

  1. Scores Matrix : SVD of Z gives you $U , S , V$  where $U$ is called the scores matrix.
                                The scores matrix need not be the complete set of $U$ ,it can be the set of vectors corresponding to the first largest eigenvalues.
  2. Loadings Matrix : Similarly $V$ matrix is called the loadings matrix.

Dimensionality Reduction

With the above terminologies we will now use them to obtain lower dimensional data . 
We had already proved that in the previous post that $U$ matrix consists of set of vectors of maximum variance . Now projecting the data (mathematically known as Dot product ) onto these directions give the lower dimensional data . 

For example let us assume first  5 vectors of the scores matrix  , premultiplying with the data matrix Z gives corresponding scores of the data matrix. Here the $Z_{m*n}$  $m \geq 5 $ , let m = 8 , n=1000 , for this case. Then it can be seen that the size of the data matrix reduces by 3000. 

Matlab Code
scores = U(: ,1:5)' *Z;
This scores can now be used as any other data matrix which can be used for regression .This is called Principal Component Regression (PCR) 

These outlines the basic concepts of PCA. Now we come to principles dealing with scaling in PCA,this is a highly controversial topic , practices such as auto scaling , or other types of scaling are performed without any proper justification , we shall first discuss auto scaling .   

Scaling in PCA

Auto Scaling : This is nothing but scaling of mean centered data matrix ($Z$) with the standard deviation of the data matrix . The performance of this type of scaling is usually better than no scaling ,but it mostly depends on the error variances of the data matrix. 

It can be proved using first principles that scaling using error variances of the data matrix performs best . Since obviously the variances of the data matrix need not be proportional to the data matrix at all , there is no guarantee that auto scaling needs to work well . It is to be seen that it performs very bad at times .

MLPCA :
Now that I have claimed that scaling using the error variances , the questions is how to actually find the error variances? 
This can be found using repeated experiments of the same parameters . For example let us talk about an experiment in which we have to measure the absorption spectra for a wide variety of wavelengths for a particular solution (containing various elements ) . Then if for the same a solution of fixed concentration we perform repeated experiments we can actually find the error variances , hence scale the data matrix . Performing PCA after this type of scaling is called MLPCA. 

The mathematical details , algorithms will be discussed in the following posts since it is a vast topic in itself.

No comments:

Post a Comment