Dimensionality Reduction in Machine Learning | PCA & LDA Explained

9/27/2025

diagram of PCA & LDA Explained

Dimensionality Reduction Techniques in Machine Learning: PCA & LDA Explained

In machine learning, handling datasets with a large number of features can lead to high computational cost, overfitting, and difficulty in visualization. Dimensionality reduction helps simplify these datasets by reducing the number of features while retaining important information.

Two of the most popular techniques are Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). In this article, we’ll explore these methods, their differences, applications, and examples in Python.

🔹 What is Dimensionality Reduction?

Dimensionality reduction is the process of reducing the number of input variables in a dataset while preserving as much important information as possible. Benefits include:

Lower computational cost
Reduced risk of overfitting
Easier visualization of high-dimensional data
Improved model performance

Dimensionality reduction can be unsupervised (e.g., PCA) or supervised (e.g., LDA).

🔹 Principal Component Analysis (PCA)

Overview

PCA is an unsupervised technique.
It transforms features into a new set of orthogonal components called principal components.
The first principal component captures the maximum variance, followed by the next components.

Key Steps

Standardize the dataset.
Compute the covariance matrix.
Calculate eigenvalues and eigenvectors.
Select top principal components based on explained variance.
Transform the dataset.

Python Example

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

X = np.random.rand(100, 10)  # Example dataset
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Explained variance ratio:", pca.explained_variance_ratio_)

Applications

Image compression
Noise reduction
Feature extraction
Data visualization

🔹 Linear Discriminant Analysis (LDA)

Overview

LDA is a supervised dimensionality reduction technique.
It maximizes class separability while reducing dimensionality.
Works well when you have labeled data.

Key Steps

Compute the mean vectors for each class.
Compute the within-class and between-class scatter matrices.
Compute eigenvalues and eigenvectors of the scatter matrices.
Select linear discriminants with highest eigenvalues.
Transform dataset onto new feature subspace.

Python Example

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
import numpy as np

X = np.random.rand(100, 5)
y = np.random.randint(0, 2, 100)  # Class labels

lda = LDA(n_components=1)
X_lda = lda.fit_transform(X, y)
print("Transformed dataset shape:", X_lda.shape)

Applications

Face recognition
Pattern classification
Text categorization
Medical diagnostics

🔹 PCA vs LDA

Feature	PCA	LDA
Type	Unsupervised	Supervised
Goal	Maximize variance	Maximize class separability
Output	Principal components	Linear discriminants
Use case	Feature extraction, visualization	Classification problems

🔹 Advantages of Dimensionality Reduction

Reduces overfitting
Speeds up training
Improves visualization of complex datasets
Helps with noise reduction

🔹 Limitations

PCA may lose interpretability of original features
LDA requires labeled data and works best with normally distributed classes
Choosing the optimal number of components can be tricky

🔹 Real-World Applications

Finance: Risk factor analysis using PCA
Healthcare: Disease classification using LDA
Computer Vision: Image recognition and compression
NLP: Topic modeling and text classification

Conclusion

Dimensionality reduction is a critical step in modern machine learning pipelines. PCA and LDA help simplify high-dimensional datasets, reduce computational cost, and improve model performance.

Choosing the right technique depends on your data type and goal—use PCA for unsupervised analysis and LDA for supervised classification tasks.

Dimensionality Reduction Techniques in Machine Learning: PCA & LDA Explained

🔹 What is Dimensionality Reduction?

🔹 Principal Component Analysis (PCA)

Overview

Key Steps

Python Example

Applications

🔹 Linear Discriminant Analysis (LDA)

Overview

Key Steps

Python Example

Applications

🔹 PCA vs LDA

🔹 Advantages of Dimensionality Reduction

🔹 Limitations

🔹 Real-World Applications

Conclusion

Table of content