Unleashing the Power of Polynomial Regression: A Comprehensive Guide

INTRODUCTION

In the realm of data analysis and machine learning, finding patterns and uncovering relationships between variables is of utmost importance. While linear regression serves as a fundamental tool for modeling such relationships, it may fall short when faced with complex, nonlinear data. Enter polynomial regression—a versatile technique that allows us to capture nonlinear trends and achieve greater accuracy in our predictions.

In this blog post, we will delve into the world of polynomial regression and explore its potential to unlock valuable insights from our data. Whether you're a beginner taking your first steps into regression analysis or an experienced data scientist seeking to expand your modeling toolbox, this guide will provide you with a solid foundation to harness the power of polynomial regression effectively.

POLYNOMIAL REGRESSION

Polynomial regression is a form of regression analysis that extends the capabilities of linear regression by allowing for nonlinear relationships between the independent variable(s) and the dependent variable. While linear regression assumes a linear relationship between variables, polynomial regression can capture more complex relationships by introducing polynomial terms of higher degrees.

POLYNOMIAL EQUATION

In polynomial regression, the relationship between the independent variable(s) and the dependent variable is modeled as an nth-degree polynomial function. The polynomial equation takes the form:

$$y = β0 + β1x + β2x² + β3x³ + ... + βnxⁿ$$

where:

  • y represents the dependent variable,

  • x represents the independent variable,

  • β₀, β₁, β₂, ..., βₙ are the coefficients that determine the shape and magnitude of the polynomial terms.

NOTE:

  • The choice of the degree of the polynomial, denoted by n, depends on the complexity of the relationship between the variables and the data at hand.

  • By increasing the degree of the polynomial, the model can fit the data more closely and capture intricate nonlinear patterns.

  • However, a higher degree polynomial may also introduce overfitting if the model becomes too complex and starts capturing noise rather than the underlying signal.

TYPES OF POLYNOMIAL REGRESSION

  1. Linear Regression (Degree 1)

  2. Quadratic Regression (Degree 2)

  3. Higher Degree Polynomial Regression (Degree > 2)

QUADRATIC REGRESSION

Quadratic polynomial regression, also known as quadratic regression, is a specific type of polynomial regression where the relationship between the independent variable and the dependent variable is modeled using a quadratic function.

EQUATION

$$y = β0 + β1x + β2x²$$

STREAMLINING QUADRATIC REGRESSION CALCULATIONS

To solve Quadratic regression we have to solve a system of three equations:

They are:

EQUATION 1:

$$∑y =nβ0+β1∑x+β2∑x^2$$

where,

n-Number of data points in given data

EQUATION 2:

$$∑xy =β0∑x+β1∑x^2+β2∑x^3$$

EQUATION 3:

$$∑x^2y =β0∑x^2+β1∑x^3+β2∑x^4$$

By solving the above equation we will get β0,β1,β2

Substitute β0,β1,β2 in the Quadratic equation.

MATHEMATICAL EXAMPLE FOR QUADRATIC REGRESSION

Obtain the regression equation for degree=2 for the given data points

x34567
y2.53.23.86.512

Solution:

Step 1: Calculating ∑x, ∑y, ∑x^2, ∑x^3, ∑x^4, ∑xy, ∑x^2y

Here, The total number of data points is 5.

hence n=5

xyx^2x^3x^4xyx^2y
32.5927817.522.5
43.2166425612.851.2
53.8251256251995
66.536216129639234
71249343240180.5563.5
∑x=25∑y=27.5∑x^2=135∑x^3=775∑x^4=4659∑xy=158.8∑x^2y=966.2

Step 2: Substituting the values in a system of 3 equations

EQUATION 1:

$$27.5 =(5)β0+β1(25)+β2(135)$$

EQUATION 2:

$$158.8=β0(25)+β1(135)+β2(775)$$

EQUATION 3:

$$966.2 =β0(135)+β1(775)+β2(4659)$$

By solving the above three equations by matrix method we get

  • β0=12.4285714

  • β1=-5.5128571

  • β2=0.7642857

Step 4: Substituting β0, β1, β2 in quadratic equation

w.k.t quadratic equation is

$$y = β0 + β1x + β2x²$$

Obtained equation is

$$y = 12.4285714 -5.5128571x + 0.7642857x²$$

EXAMPLE CODE FOR POLYNOMIAL REGRESSION

The code using the above data points in mathematical calculation

import numpy as np
import pandas as pf
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#data
x=np.array([3,4,5,6,7])
y=np.array([2.5,3.2,3.8,6.5,12])
#reshaping indpendent variable
x=x.reshape(-1,1)
#how curve fit from degree 1 to 2
for i in range(1,3):
    polynomial=PolynomialFeatures(degree=i)
    x_poly=polynomial.fit_transform(x)
    polynomial.fit(x_poly,y)
    linear=LinearRegression()
    linear.fit(x_poly,y)
    y_pred=linear.predict(x_poly)
    plt.scatter(x,y,color='blue')
    plt.plot(x,y_pred,color='red')
    plt.show()

OUTPUT

Output for degree=1:

Output for degree=2:

See, how increasing the degree from 1 to 2 helps the model to learn more effectively.

Note: For degree=1 the model is underfit.

EXAMPLE OF OVER-FITTING AND UNDERFITTING IN POLYNOMIAL REGRESSION

import numpy as np
import pandas as pf
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#data
x = np.array([1, 5, 3, 2, 6])
y = np.array([2, 6, 10, 20, 30])
#reshaping indpendent variable
x=x.reshape(-1,1)
#model performance from degree 1 to 4
for i in range(1,5):
    polynomial=PolynomialFeatures(degree=i)
    x_poly=polynomial.fit_transform(x)
    print('Polynomial Features',x_poly)
    polynomial.fit(x_poly,y)
    linear=LinearRegression()
    linear.fit(x_poly,y)
    y_pred=linear.predict(x_poly)
    plt.scatter(x,y,color='blue')
    plt.plot(x,y_pred,color='red')
    plt.show()

OUTPUT

Underfitting by the model for the data points at

degree=1:

degree=2:

Good performance of the model at the

degree=3:

Overfitting by the model at

degree=4:

WHEN TO USE POLYNOMIAL REGRESSION?

SituationPolynomial Regression Usage
Nonlinear RelationshipsWhen the relationship between variables exhibits curvature or nonlinearity, polynomial regression can capture these complex patterns.
U-Shaped or Inverted U-Shaped RelationshipsPolynomial regression is effective in modeling relationships that follow a U-shaped or inverted U-shaped curve.
Flexibility in ModelingPolynomial regression provides greater flexibility than linear regression by allowing for higher-degree polynomial terms, enabling a better fit to the data.
Interaction EffectsPolynomial regression can capture interaction effects between variables, where the relationship between them changes with different values of the variables.
Limited Prior Knowledge of the RelationshipWhen there is limited prior knowledge or theory about the underlying relationship, polynomial regression can be used as a more exploratory modeling approach.
Data Visualization and InterpretabilityPolynomial regression can help visualize and interpret complex relationships, as the higher-degree terms allow for more detailed insights into the data.
The trade-Off between Bias and VarianceBy adjusting the degree of the polynomial, one can find an optimal trade-off between bias and variance in the model, depending on the dataset and desired level of complexity.
***Warning: OverfittingBe cautious when using high-degree polynomials, as they can lead to overfitting, capturing noise rather than true patterns. Regularization techniques may be necessary to mitigate this.

CONCLUSION

In conclusion, polynomial regression offers a powerful tool for modeling nonlinear relationships and capturing complex patterns in the data. By extending beyond linear regression, it provides flexibility in fitting curves, capturing U-shaped relationships, and exploring interaction effects. However, caution should be exercised to avoid overfitting and ensure the appropriate degree of the polynomial is chosen based on the data characteristics.

Refer this to for Linear regression and for solving system of equations

HOPE YOU ENJOYED READING IT!!!

STAY TUNED !!!

SUPPORT MY BLOG !!!

THANKYOU !!!