Principal Component Analysis is the data reduction technique reducing the number of features in a few numbers while maximum number of variation is explained. This enables the proper data visualisation and interpretation of multiple dimensions of the data. Data is transformed into the new coordinate system with new axes named as principal components. Principal components are the linear transformations of the features into the new data where each data point is multiplied by some constant.
Briefly, in few steps, PCA is applied in these ways:
Each data point is centred around the mean. In other words, each data point is subtracted from its respective mean.
Covariance matrix is calculated between the features. Covariance matrix is the arrangement of variance and covariance of two features. For covariance of the sample, each corresponding data point subtracted from their own respective means are multiplied which is divided by the number of samples subtracted by 1.
Eigen value of the covariance matrix is calculated. Eigen values are the constant which, when multiplied with the given vector, gives the same vector received by multiplying the covariance matrix with the given vector. The vector is called the eigen vector.
That eigen vector is used to multiply the centred data point. The resulting values are the scores for each feature in that principal component.
Generally, eigen values greater than 1 are selected as the principal components. Each principal component has some variance explained. In most cases, the first two or three principal components explain most of the variances which can surely vary in different types of the data.
Following resources are helpful to dive deeply in understanding the PCA.
- PCA : the math - step-by-step with a simple example https://youtu.be/S51bTyIwxFs
- StatQuest: Principal Component Analysis (PCA), Step-by-Step https://youtu.be/FgakZw6K1QQ
- https://www.mathsisfun.com/algebra/eigenvalue.html
No comments:
Post a Comment