Feature Scaling in ML: Data Normalization and Standardization Simplifie

9/25/2025

Standardization formula showing Z-score transformation

Go Back

Data Normalization and Standardization in Machine Learning: A Complete Guide for Beginners

Introduction

In machine learning, the quality of your model heavily depends on the quality of your data. Before feeding data into an algorithm, it’s crucial to preprocess it correctly. One of the most important preprocessing steps is feature scaling, which ensures that numerical features are on a similar scale. Two widely used techniques for this are data normalization and data standardization.

In this guide, we’ll explore what these techniques are, why they’re important, how they differ, and how you can implement them with practical examples in Python.


 Standardization formula showing Z-score transformation

Why Feature Scaling Matters in Machine Learning

Many machine learning algorithms (such as gradient descent-based models, k-nearest neighbors, and support vector machines) are sensitive to the scale of data. Features with larger ranges can dominate smaller ones, leading to biased models and poor performance.

For example:

  • Feature A (Age): 20 – 60

  • Feature B (Salary): 30,000 – 150,000

If these features are used directly, the model may give more importance to “Salary” simply because its values are larger. Scaling them brings all features to a comparable range, improving both training speed and model accuracy.


What is Data Normalization?

Normalization is the process of scaling features into a specific range — typically [0, 1]. It rescales the data proportionally based on the minimum and maximum values of each feature.

Formula:

X=XXminXmaxXminX' = \frac{X - X_{min}}{X_{max} - X_{min}}

Where:

  • XX: original value

  • XminX_{min}: minimum value of the feature

  • XmaxX_{max}: maximum value of the feature

  • XX': normalized value

Example:

Let’s normalize the following dataset:

  • Original values: [20, 25, 30, 35, 40]

X=X204020X' = \frac{X - 20}{40 - 20}

  • 20 → 0

  • 25 → 0.25

  • 30 → 0.5

  • 35 → 0.75

  • 40 → 1

Now, all values are between 0 and 1.

When to Use Normalization:

  • When you don’t know the data distribution.

  • For algorithms based on distance (e.g., KNN, K-Means).

  • When feature values vary widely and you want them in the same range.


What is Data Standardization?

Standardization (also known as Z-score normalization) scales data so that it has a mean of 0 and a standard deviation of 1. Unlike normalization, standardization doesn’t bound the values to a specific range.

Formula:

Z=XμσZ = \frac{X - \mu}{\sigma}

Where:

  • XX: original value

  • μ\mu: mean of the feature

  • σ\sigma: standard deviation of the feature

  • ZZ: standardized value

Example:

Suppose we have: [20, 25, 30, 35, 40]

  • Mean (μ\mu) = 30

  • Standard Deviation (σ\sigma) ≈ 7.07

For X=35X = 35:

Z=35307.070.71Z = \frac{35 - 30}{7.07} ≈ 0.71

When to Use Standardization:

  • When features follow a Gaussian (normal) distribution.

  • For algorithms like Logistic Regression, Linear Regression, SVM, PCA.

  • When outliers are present — standardization is less affected than normalization.


Normalization vs. Standardization: Key Differences

FeatureNormalizationStandardization
Scale range0 to 1 (or -1 to 1)Mean = 0, Std = 1
UsesKNN, Neural Networks, Distance-based modelsSVM, PCA, Regression
Sensitive to OutliersYesLess sensitive
Data DistributionDoesn’t assume anyWorks best with Gaussian

Implementation in Python (Using Scikit-learn)

Here’s how you can easily apply both techniques using scikit-learn.

Normalization with MinMaxScaler:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[20], [25], [30], [35], [40]])
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)

print("Normalized Data:\n", normalized_data)

Standardization with StandardScaler:

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[20], [25], [30], [35], [40]])
scaler = StandardScaler()
standardized_data = scaler.fit_transform(data)

print("Standardized Data:\n", standardized_data)

Best Practices for Feature Scaling

  • Always fit your scaler on the training data and apply the same transformation to test data.

  • Visualize distributions before choosing between normalization or standardization.

  • Avoid scaling categorical variables — only apply it to numerical features.

  • Combine scaling with other preprocessing steps like encoding and missing value imputation.


Conclusion

Data normalization and standardization are foundational steps in data preprocessing. They ensure that features contribute equally to the learning process and prevent bias caused by varying scales. Choosing the right technique depends on your data distribution, the type of algorithm, and the presence of outliers.

By mastering these scaling techniques, you’ll improve model performance, training efficiency, and prediction accuracy — making them indispensable tools in every data scientist’s toolkit.


Key Takeaways:

  • Normalization rescales features to a specific range (usually 0–1).

  • Standardization transforms data to have a mean of 0 and standard deviation of 1.

  • Both techniques improve model training and accuracy by ensuring features are comparable.

  • Always experiment with both to see which works best for your dataset and algorithm.

 

 FAQ – Data Normalization and Standardization

  1. What is the difference between data normalization and standardization?

  2. When should I use normalization instead of standardization?

  3. Why is feature scaling important in machine learning?

  4. Does standardization handle outliers better than normalization?

  5. Can I apply normalization or standardization to categorical data?

  6. Which algorithms require data normalization or standardization?

  7. What happens if I skip feature scaling in my ML model?

  8. Is MinMaxScaler the same as normalization?

  9. Can I use both normalization and standardization together?

  10. How do I decide which scaling technique is best for my dataset?