Lesson 14: Linear Regression — Predicting Numbers
Explain linear regression intuitively; fit a line to data using scikit-learn.
Linear Regression: Predicting Numbers
Linear Regression is one of the simplest yet most powerful algorithms. It's used when we want to predict a continuous number, such as the price of a house or tomorrow's temperature.
The Best Fit Line
Imagine a scatter plot of house sizes versus house prices. Linear regression tries to draw a straight line through these points that best captures the trend. The equation of this line is y = mx + b, where m is the slope (how much price increases per square foot) and b is the intercept.
The model "learns" by minimizing the distance between its line and the actual data points. This distance is measured using a loss function like Mean Squared Error (MSE).
Python Challenge: Fit the Line!
Use scikit-learn to train a linear regression model and predict a value.
from sklearn.linear_model import LinearRegression
import numpy as np
# Study hours (features) and test scores (labels)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([50, 60, 70, 80, 90])
# TODO: Initialize and train the Linear Regression model
# model = ???
# model.???
# TODO: Predict the score for a student who studies 6 hours
# prediction = ???
# print(f"Predicted score: {prediction[0]}")When you run this code, you'll see the model figures out that each hour of study adds 10 points to the score!