In the world of AI, there is a golden rule: "Garbage in, garbage out."
You can have the most advanced, sophisticated algorithms in the world, but if the data you feed them is flawed, incomplete, or biased, your AI will produce bad results. Data is the fuel that powers the engine of AI.
Data comes in many forms, and AI can learn from all of them:
Before we start building models, we need to speak the language of data scientists:
Imagine we are building an AI to predict whether a pet will be adopted quickly.
| Animal Type (Feature) | Age in Months (Feature) | Health Status (Feature) | Days to Adopt (Label) |
|---|---|---|---|
| Dog | 3 | Healthy | 4 |
| Cat | 24 | Needs Meds | 21 |
| Dog | 84 | Healthy | 15 |
If your training data is biased, your AI will be biased. If we train an AI to screen resumes using data from a company that historically only hired men, the AI will "learn" that being male is a desirable feature. Understanding our data isn't just a technical necessity—it's an ethical obligation. We'll explore this more in the next lesson.