Go Back

Understanding Decision Trees for Regression: Step-by-Step Explanation

Sun Jan 12 2025 23:59:22 GMT+0000 (Coordinated Universal Time)
All Articles

#decision tree regression, #regression decision tree example, #decision tree in machine learning, #decision tree splitting criteria, #machine Learning

How a Decision Tree Works in Regression: A Complete Guide with Example

Decision trees are a popular algorithm for both classification and regression tasks. In this article, we will focus on how a decision tree works for regression, breaking down the process step by step with an example to make it easier to understand. We'll also cover the formula used to calculate the best split by minimizing errors, making this guide both comprehensive and practical.


🧩 What Is a Regression Decision Tree?

A regression decision tree is a model that predicts continuous values (like prices, temperatures, etc.) by splitting the data into smaller and smaller subsets based on feature values. At each split, the tree aims to minimize the error, and the final prediction is the average of the target values in the leaf nodes.

Let’s dive into the step-by-step process of building a decision tree for regression with an example dataset.


📊 Example Dataset

Consider the following dataset with three input features (x1, x2, x3) and a target value:

x1 x2 x3 Target
3 100 70 0.1
5 200 80 0.2

We will use this dataset to understand how the decision tree makes predictions in regression.


Step 1: Splitting the Data

The first step in building a regression decision tree is to split the data into regions based on the feature values. The tree evaluates each feature (x1, x2, x3) and splits the data into ranges.

🔧 Splitting Criteria:

  • x1 is divided into two ranges: (1-6) and (6-10)
  • x2 is divided into three ranges: (1-100), (100-300), and (300-500)
  • x3 is divided into three ranges: (1-30), (30-60), and (60-100)

For each split, the tree calculates the target values within that range.


Step 2: Calculating the Average (Prediction for Each Region)

After splitting the data, the decision tree calculates the average target value in each region. For this example, let’s assume the average target value yny_n for each region is calculated as follows:

yn=y1+y2+y33y_n = \frac{y_1 + y_2 + y_3}{3}

For simplicity, let’s say the average target value (yny_n) for a particular region is 3.


Step 3: Final Prediction

The final prediction for a new input is the average of the target values from the regions it falls into. If a new input falls into regions with predictions y1y_1, y2y_2, and y3y_3, the final prediction is:

Final Prediction=y1+y2+y33\text{Final Prediction} = \frac{y_1 + y_2 + y_3}{3}

For example, if the predictions from the regions are 2.5, 3.0, and 3.5, the final prediction will be:

Final Prediction=2.5+3.0+3.53=3.0\text{Final Prediction} = \frac{2.5 + 3.0 + 3.5}{3} = 3.0


Step 4: Selecting the Best Split (Minimizing Error)

The decision tree selects the best split by minimizing the Sum of Squared Errors (SSE). The split that results in the smallest error is chosen.

🎯 Formula to Minimize Error:

The error is calculated using the following formula:

SSE=i=1n(yiyˉ)2SSE = \sum_{i=1}^{n} (y_i - \bar{y})^2

Where:

  • yiy_i = actual target value
  • yˉ\bar{y} = mean of the target values in the subset
  • nn = number of samples in the subset

The tree continues splitting until the error is minimized or it reaches a predefined stopping criterion (like maximum depth or minimum samples per leaf).


Illustrative Example for Splitting the Data

Consider a dataset with a feature xx and a target variable yy:

xx yy
2 3.0
4 3.5
6 5.0
8 7.5
10 9.0

To determine the optimal split:

🔍 Potential Split at x=5x = 5:

  • Subset 1: x5x \leq 5
    • Data: (2, 3.0), (4, 3.5)
    • Mean of yy: yˉ1=3.25\bar{y}_1 = 3.25
    • SSE: (3.03.25)2+(3.53.25)2=0.125(3.0 - 3.25)^2 + (3.5 - 3.25)^2 = 0.125
  • Subset 2: x>5x > 5
    • Data: (6, 5.0), (8, 7.5), (10, 9.0)
    • Mean of yy: yˉ2=7.17\bar{y}_2 = 7.17
    • SSE: (5.07.17)2+(7.57.17)2+(9.07.17)2=8.33(5.0 - 7.17)^2 + (7.5 - 7.17)^2 + (9.0 - 7.17)^2 = 8.33
  • Total SSE: 0.125+8.33=8.4550.125 + 8.33 = 8.455

🔍 Potential Split at x=7x = 7:

  • Subset 1: x7x \leq 7
    • Data: (2, 3.0), (4, 3.5), (6, 5.0)
    • Mean of yy: yˉ1=3.83\bar{y}_1 = 3.83
    • SSE: (3.03.83)2+(3.53.83)2+(5.03.83)2=2.33(3.0 - 3.83)^2 + (3.5 - 3.83)^2 + (5.0 - 3.83)^2 = 2.33
  • Subset 2: x>7x > 7
    • Data: (8, 7.5), (10, 9.0)
    • Mean of yy: yˉ2=8.25\bar{y}_2 = 8.25
    • SSE: (7.58.25)2+(9.08.25)2=1.125(7.5 - 8.25)^2 + (9.0 - 8.25)^2 = 1.125
  • Total SSE: 2.33+1.125=3.4552.33 + 1.125 = 3.455

In this example, the split at x=7x = 7 yields a lower total SSE (3.455) compared to the split at x=5x = 5 (8.455). Therefore, the decision tree would select x=7x = 7 as the optimal split point to minimize prediction errors.


Summary of the Process:

  1. Split the data into regions based on feature values.
  2. Calculate the average target value for each region.
  3. Make predictions by averaging the target values from the relevant regions.
  4. Select the best split by minimizing the sum of squared errors (SSE).

 


 

In a regression decision tree, selecting the optimal split at each node is crucial for accurate predictions. The primary objective is to partition the data into subsets that minimize the Sum of Squared Errors (SSE) within each region.

Steps to Determine the Optimal Split:

  1. Evaluate Potential Splits:

    • For each feature, consider all possible split points.
    • A split point divides the data into two subsets: one where the feature's value is less than or equal to the split point, and another where it's greater.
  2. Calculate the Sum of Squared Errors (SSE):

    • For each potential split, compute the SSE for the resulting subsets.
    • Formula for SSE: SSE=i=1n(yiyˉ)2SSE = \sum_{i=1}^{n} (y_i - \bar{y})^2 Where:
      • yiy_i = actual target value
      • yˉ\bar{y} = mean of the target values in the subset
      • nn = number of samples in the subset
  3. Select the Split with the Minimum SSE:

    • Identify the split that results in the lowest SSE, indicating the most homogeneous subsets.
    • This process is known as variance reduction and is a common technique in regression trees.

Illustrative Example:

Consider a dataset with a feature xx and a target variable yy:

xx yy
2 3.0
4 3.5
6 5.0
8 7.5
10 9.0

To determine the optimal split:

  • Potential Split at x=5x = 5:

    • Subset 1: x5x \leq 5
      • Data: (2, 3.0), (4, 3.5)
      • Mean of yy: yˉ1=3.25\bar{y}_1 = 3.25
      • SSE: (3.03.25)2+(3.53.25)2=0.125(3.0 - 3.25)^2 + (3.5 - 3.25)^2 = 0.125
    • Subset 2: x>5x > 5
      • Data: (6, 5.0), (8, 7.5), (10, 9.0)
      • Mean of yy: yˉ2=7.17\bar{y}_2 = 7.17
      • SSE: (5.07.17)2+(7.57.17)2+(9.07.17)2=8.33(5.0 - 7.17)^2 + (7.5 - 7.17)^2 + (9.0 - 7.17)^2 = 8.33
    • Total SSE: 0.125+8.33=8.4550.125 + 8.33 = 8.455
  • Potential Split at x=7x = 7:

    • Subset 1: x7x \leq 7
      • Data: (2, 3.0), (4, 3.5), (6, 5.0)
      • Mean of yy: yˉ1=3.83\bar{y}_1 = 3.83
      • SSE: (3.03.83)2+(3.53.83)2+(5.03.83)2=2.33(3.0 - 3.83)^2 + (3.5 - 3.83)^2 + (5.0 - 3.83)^2 = 2.33
    • Subset 2: x>7x > 7
      • Data: (8, 7.5), (10, 9.0)
      • Mean of yy: yˉ2=8.25\bar{y}_2 = 8.25
      • SSE: (7.58.25)2+(9.08.25)2=1.125(7.5 - 8.25)^2 + (9.0 - 8.25)^2 = 1.125
    • Total SSE: 2.33+1.125=3.4552.33 + 1.125 = 3.455

In this example, the split at x=7x = 7 yields a lower total SSE (3.455) compared to the split at x=5x = 5 (8.455). Therefore, the decision tree would select x=7x = 7 as the optimal split point to minimize prediction errors.

By systematically evaluating all possible splits and selecting the one that minimizes SSE, regression decision trees effectively partition the data into regions that lead to the most accurate predictions.

showing an illustration of Understanding Decision Trees for Regression: Step-by-Step Explanation  and <p>#decision tree regression, #regression decision tree example, #decision tree in machine learning, #decision tree splitting criteria, #machine Learning</p>

Article