Decision Trees: AI That Explains Itself
Imagine playing a game of 20 Questions. You try to guess what someone is thinking by asking yes/no questions that divide the possibilities in half. A Decision Tree algorithm does exactly this!
A decision tree examines your data and finds the best questions to ask. At each step (or "node"), it splits the data based on a feature (e.g., "Is age > 30?"). It keeps splitting until the resulting groups are as "pure" as possible (meaning they mostly contain one category).
The biggest danger with decision trees is Overfitting. If you let it ask infinite questions, it will memorize every single person in the training set instead of learning general patterns. We can control this by setting a maximum depth (`max_depth`).
Train a Decision Tree Classifier and try changing the max_depth parameter.
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# TODO: Initialize DecisionTreeClassifier with max_depth=2
# tree = ???
# TODO: Fit the model and check test accuracy
# tree.???
# preds = tree.predict(X_test)
# print(f"Accuracy: {accuracy_score(y_test, preds)}")Unlike deep neural networks which are "black boxes", decision trees are highly interpretable. You can print the tree and see exactly what logic it used to make a prediction!