Lesson 1: Activation Functions & Backpropagation

How does a neural network convert simple mathematical operations into high-level intelligence? The secret lies in non-linear activation functions and weight adjustments through backpropagation.

Why Do We Need Activation Functions?

Without activation functions, a neural network is just a giant stack of linear transformations (y = wx + b). No matter how many layers you stack, a combination of linear functions is always just another linear function. Activation functions introduce non-linearity, allowing neural networks to learn complex, non-linear decision boundaries.

Popular Activation Functions

Sigmoid:Maps input values to a range between 0 and 1. Formula: σ(z) = 1 / (1 + e⁻ᶻ). While historically popular, it suffers from the vanishing gradient problem, where weights in early layers stop updating during training.
ReLU (Rectified Linear Unit):The modern standard. Formula: f(x) = max(0, x). It is extremely fast to compute and prevents vanishing gradients for positive inputs.
Tanh:Maps input values to a range between −1 and 1. Formula: tanh(z) = (eᶻ − e⁻ᶻ) / (eᶻ + e⁻ᶻ). Zero-centered, which can aid convergence.

Activation Function Explorer

Toggle between activation functions and drag the slider to see how the output and derivative change across different inputs.

Input x1.00

-5.005.0

Output1.0000

Derivative1.0000

Functionf(x) = max(0, x)

Derivativef'(x) = x > 0 ? 1 : 0

The Mechanics of Backpropagation

Backpropagation is how a neural network learns. After making a prediction (the forward pass), the network calculates the prediction error using a loss function. During the backward pass, we use the mathematical Chain Rule to compute the gradient of the loss function with respect to each individual weight in the network. These gradients tell us how much to adjust each weight to minimize error.

Neural Network Forward & Backward Pass

Watch data flow forward through the network, then see gradients propagate backward via the chain rule. Inputs: x₁ = 1, x₂ = 0.5 | Target: 0.8 | Activation: ReLU

Click Forward Pass to start the computation.

Exercise: Manual Forward Pass

Calculate the forward pass for a single neuron with a ReLU activation function.

Input (x): 2.0

Weight (w): -1.5

Bias (b): 1.0

Weighted sum z = (2.0 × -1.5) + 1.0 = -2.0, and ReLU(-2.0) = -2.0
Weighted sum z = (2.0 × -1.5) + 1.0 = -2.0, and ReLU(-2.0) = 0.0 (since ReLU caps at 0)
Weighted sum z = (2.0 × -1.5) + 1.0 = 4.0, and ReLU(4.0) = 4.0

In the next lesson, we will see how we use these calculated gradients to update our weights using optimized descent algorithms!