Mastering Linear Regression: A Mathematical Journey to Predictive Excellence

INTRODUCTION

Linear regression is a powerful statistical technique used to model the relationship between a dependent variable and one or more independent variables. It is widely employed in various fields, including finance, economics, social sciences, and machine learning.

GOAL

  • To find a linear equation that best fits the given data

  • Allowing us to make predictions and understand the relationship between the variables.

ASSUMPTION

  • It assumes a linear relationship between the independent variables and the dependent variable.

  • It is represented by a straight line on a scatter plot.

TYPES OF LINEAR REGRESSION

  1. SIMPLE LINEAR REGRESSION

  2. MULTIPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION

Simple linear regression involves a single independent variable (predictor variable) and a single dependent variable. It models the linear relationship between the predictor variable and the response variable.

SIMPLE LINEAR REGRESSION EQUATION

$$y = b1x + b0$$

where,

y: Dependent variable

x: Independent variable

b1: Coefficient(slope)

b0: y-intercept

TO FIND THE SLOPE :

$$b1=(nΣxy - ΣxΣy) / (nΣx^2 - (Σx)^2)$$

TO FIND THE INTERCEPT :

$$b0=(Σy - b1Σx) / n$$

NOTE: n- number of data points in the record.

EXAMPLE FOR SIMPLE LINEAR REGRESSION

Given dataset

X678910
Y18202325

Find the regression line for dependent variable y using this data.

SOLUTION:

w.k.t

$$Y = b1X + b0$$

XYXYX^2
11.51.51
23.87.64
36.720.19
49.03616
511.25625
613.681.636
71611249
ΣX=28ΣY=61.8ΣXY=314.8ΣX^2=140

Applying,

THE SLOPE:

$$b1=(nΣXY - ΣXΣY) / (nΣX^2 - (ΣY)^2)$$

$$ b1=((7314.8)-(2861.8))/(7*140-(28)^2)$$

$$ b1=473.7/196=2.4142857$$

THE INTERCEPT:

$$b0=(ΣY - b1ΣX) / n$$

$$ b0=(61.8-(2.4142857*28))/7$$

$$ b0=-0.828571$$

FINAL REGRESSION EQUATION:

$$Y=2.41X-0.83$$

MULTIPLE LINEAR REGRESSION

Multiple linear regression involves two or more independent variables and a single dependent variable. It extends the concept of simple linear regression to model the linear relationship between multiple predictors and the response variable simultaneously.

MULTIPLE LINEAR REGRESSION EQUATION

$$y = b0 + b1x1 + b2x2 + ... + bnxn$$

where,

y: Dependent variable

x1,x2......xn: Independent variables

b1,b2........bn: Coefficients(slopes)

b0: y-intercept

TO FIND THE COEFFICIENT:

We go with the Matrix-Inversion method.

$$b'=((x^Tx)'x^T)y$$

here,

x-Independent variable matrix with 1s at the first column

y-Dependent variable matrix

b'=coefficients i.e b0,b1,b2...bn

EXAMPLE FOR MULTIPLE LINEAR REGRESSION

Given data,

Product-1 salesProduct-2 salesWeekly sales
141
256
388
4212

Find the regression equation for weekly sales which is the dependent variable and Product-1 & Product-2 are independent variables.

SOLUTION:

The regression equation for this case is

$$y=b0+b1x1+b2x2$$

Step1 :

INDEPENDENT MATRIX (x):

114
125
138
142

Step2 :

DEPENDENT MATRIX (y):

1
6
8
12

Step3 :

TRANSPOSE OF x i.e x^T:

1111
1234
4582

Step4 :

FIND (x^T)x:

41019
103046
1946109

Step5 :

FIND INVERSE ((x^T)x)':

3.15-0.59-0.30
-0.590.200.016
-0.300.0160.054

Step6 :

FIND ((x^T)x)'x^T:

0.050.47-1.020.19
-0.32-0.0980.1550.26
-0.0650.0050.185-0.125

Step7 :

Equating the above result with b'

here b' is

b0
b1
b2

we get

b0=-1.69

b1=3.48

b2=-0.05

Step8 :

Substituting the b0,b1,b2 in regression equation we get

$$y=-1.69+3.48x1-0.05x2$$

CODE FOR SIMPLE LINEAR REGRESSION

import numpy as np
from sklearn.linear_model import LinearRegression

# Example dataset
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable (reshape to a column vector)
y = np.array([2, 4, 6, 8, 10])                # Dependent variable

# Create and fit the linear regression model
model = LinearRegression()
model.fit(X, y)

# Get the slope (coefficient) and intercept of the regression line
slope = model.coef_[0]
intercept = model.intercept_

# Print the results
print("Slope (Coefficient):", slope)
print("Intercept:", intercept)

OUTPUT:

Slope (Coefficient): 2.0000000000000004
Intercept: -1.7763568394002505e-15

CODE FOR MULTIPLE LINEAR REGRESSION

import numpy as np
from sklearn.linear_model import LinearRegression

# Example dataset
X = np.array([[1, 2], [2, 5], [3, 9], [4, 3], [5, 6]])  # Independent variables (features)
y = np.array([3, 5, 7, 9, 11])                          # Dependent variable

# Create and fit the multiple linear regression model
model = LinearRegression()
model.fit(X, y)

# Get the coefficients (slopes) and intercept of the regression line
coefficients = model.coef_
intercept = model.intercept_

# Print the results
print("Coefficients:", coefficients)
print("Intercept:", intercept)

CODE:

Coefficients: [2.0000000e+00 1.7286149e-16]
Intercept: 0.9999999999999991

NOTE:

  • Reshaping the independent data in Multiple linear regression is not necessary

  • Linear Regression model supports both Simple and Multiple linear regression.

USE OF LINEAR REGRESSION

SituationUse of Linear Regression
Analyzing the relationship between two variablesSimple Linear Regression
Predicting a numerical outcome based on a single predictor variableSimple Linear Regression
Examining the impact of multiple predictor variables on a dependent variableMultiple Linear Regression
Forecasting future trends or outcomesSimple or Multiple Linear Regression
Making predictions in machine learningSimple or Multiple Linear Regression
Identifying the strength and direction of the relationship between variablesSimple Linear Regression (with correlation coefficient calculation)

CONCLUSION

Linear regression provides a powerful framework for analyzing and modeling the relationships between variables. Its mathematical calculations, types, and applications make it an indispensable tool for data analysis and prediction. By mastering linear regression techniques, we unlock the ability to uncover patterns, make informed predictions, and gain deeper insights into the underlying dynamics of the data.

Hope you enjoyed the reading !!!

Let's see you on the next blog !!!

Stay tuned !!!

Follow the blog !!!

Thank you!!!