Wednesday, 8 March 2023

PCA in brief

Principal Component Analysis is the data reduction technique reducing the number of features in a few numbers while maximum number of variation is explained. This enables the proper data visualisation and interpretation of multiple dimensions of the data. Data is transformed into the new coordinate system with new axes named as principal components. Principal components are the linear transformations of the features into the new data where each data point is multiplied by some constant. 

Briefly, in few steps, PCA is applied in these ways:

  • Each data point is centred around the mean. In other words, each data point is subtracted from its respective mean.

  • Covariance matrix is calculated between the features. Covariance matrix is the arrangement of variance and covariance of two features. For covariance of the sample, each corresponding data point subtracted from their own respective means are multiplied which is divided by the number of samples subtracted by 1.

  • Eigen value of the covariance matrix is calculated. Eigen values are the constant which, when multiplied with the given vector, gives the same vector received by multiplying the covariance matrix with the given vector. The vector is called the eigen vector.

  • That eigen vector is used to multiply the centred data point. The resulting values are the scores for each feature in that principal component. 

  • Generally, eigen values greater than 1 are selected as the principal components. Each principal component has some variance explained. In most cases, the first two or three principal components explain most of the variances which can surely vary in different types of the data. 

The complexity of data is reduced in the few features that are advantages in interpreting the data. And the principal component scores adds the advantage of observing the important feature in that principal component. In simpler terms, PCA is the method to change the X and Y axis of the data in the scatter plot, and make the new X and Y axis to understand how data points spread.

Following resources are helpful to dive deeply in understanding the PCA.

- PCA : the math - step-by-step with a simple example https://youtu.be/S51bTyIwxFs 
- StatQuest: Principal Component Analysis (PCA), Step-by-Step https://youtu.be/FgakZw6K1QQ 
https://www.mathsisfun.com/algebra/eigenvalue.html 


No comments:

Post a Comment

Current Affairs: April 1, 2025

1. Prime Minister Oli Departs for Thailand to Attend BIMSTEC Summit Official Visit to Thailand Prime Minister KP Sharma Oli is departi...