Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms a large set of variables into a smaller set of orthogonal principal components, capturing the maximum variance in the data while simplifying it.
Loadings are the coefficients or weights assigned to the original variables in the linear combinations that form each principal component. These loadings indicate how much each original variable contributes to a principal component, providing insights into the nature of the components.
A higher absolute loading value suggests a stronger influence of that variable on the principal component. Loadings can be positive or negative, showing the direction of the contribution, but the sign itself is generally not as important as the magnitude.
In summary, loadings tell us which variables weigh heavily in each principal component and help in interpreting the components by revealing the relationship between the original variables and the derived components.
The mathematical underpinnings of PCA connect the loadings directly to the eigenvectors of the covariance or correlation matrix of the data:
Matrix Representation:
In PCA, the loadings are the elements of the eigenvectors of the covariance or correlation matrix. Each column of the loading matrix corresponds to a principal component, and each row corresponds to one of the original variables. The principal components themselves are linear combinations of the original variables weighted by these loadings.
Eigenvectors and Loadings:
The eigenvectors of the covariance/correlation matrix give the directions (loadings) in the feature space that maximize the variance captured by each component. Mathematically, if is the matrix whose columns are eigenvectors, then the principal components are given by , where is the original data matrix.
Orthogonality:
The principal components are orthogonal (uncorrelated), which means the eigenvectors (and thus the loadings) are orthogonal vectors. This ensures each principal component represents independent directions of variance.
Scaling and Covariance vs. Correlation Matrix:
If PCA is performed on the correlation matrix, variables are standardized to zero mean and unit variance before analysis, making all variables equally weighted. If PCA is applied to the covariance matrix, variables with larger variance will usually have larger loadings, because variance directly impacts the covariance matrix entries.
Contribution to Variance:
The square of a loading for a variable on a principal component indicates the proportion of that variable's variance captured by the component. For example, a loading of 0.8 means , or 64% of the variance of that variable is explained by that component.
In summary, the eigenvectors of the covariance or correlation matrix form the loading matrix in PCA, defining how original variables are weighted to form each principal component. These loadings are orthogonal, depend on the scaling of input data, and their squared values quantify the variance contribution of variables to components.
