Lesson 1: Activation Functions & Backpropagation
Understand how neural networks learn by calculating gradients and updating weights.
How does a neural network convert simple mathematical operations into high-level intelligence? The secret lies in non-linear activation functions and weight adjustments through backpropagation.
Why Do We Need Activation Functions?
Without activation functions, a neural network is just a giant stack of linear transformations (y = wx + b). No matter how many layers you stack, a combination of linear functions is always just another linear function. Activation functions introduce **non-linearity**, allowing neural networks to learn complex, non-linear boundaries.
Popular Activation Functions
- Sigmoid: Maps input values to a range between 0 and 1. Formula: s(z) = 1 / (1 + e^-z). While historically popular, it suffers from the **vanishing gradient** problem, where weights in early layers stop updating during training.
- ReLU (Rectified Linear Unit): The modern standard. Formula: f(x) = max(0, x). It is extremely fast to compute and prevents vanishing gradients for positive inputs.
The Mechanics of Backpropagation
Backpropagation is how a neural network learns. After making a prediction (the **forward pass**), the network calculates the prediction error using a loss function. During the **backward pass**, we use the mathematical **Chain Rule** to compute the gradient of the loss function with respect to each individual weight in the network. These gradients tell us how much to adjust each weight to minimize error.
Exercise: Manual Forward Pass
Calculate the forward pass for a single neuron with a ReLU activation function.
- [ ]Weighted sum z = (2.0 * -1.5) + 1.0 = -2.0, and ReLU(-2.0) = -2.0
- [x]Weighted sum z = (2.0 * -1.5) + 1.0 = -2.0, and ReLU(-2.0) = 0.0 (since ReLU caps at 0)
- [ ]Weighted sum z = (2.0 * -1.5) + 1.0 = 4.0, and ReLU(4.0) = 4.0
In the next lesson, we will see how we use these calculated gradients to update our weights using optimized descent algorithms!