Lesson 4: RAG (Retrieval-Augmented Generation) Basics

Large Language Models are frozen in time when their training ends, and they do not have access to your private files. Retrieval-Augmented Generation (RAG) solves this by feeding relevant documents into the prompt at runtime.

The RAG Pipeline

A standard RAG system consists of three key steps:

Ingestion: Break down large PDF or text files into small paragraphs (chunks) and convert them into numerical representations called Vector Embeddings.
Retrieval: When a user asks a question, convert their query into a vector embedding and search a Vector Database to retrieve the document chunks that are mathematically closest to the query.
Generation:Feed the retrieved chunks as "context" along with the user's question into the LLM, which synthesizes a highly accurate response using the private documents.

RAG Pipeline Flow

Press Play Pipeline to watch data flow through all three RAG stages with animated transitions.

Click Play Pipeline to animate the RAG flow.

Vector Embeddings and Cosine Similarity

An embedding model maps textual meaning to a high-dimensional vector space (e.g. 1536 dimensions). To calculate how similar two text documents are, we measure the angle between their vector coordinates. The mathematical standard for this is Cosine Similarity, which ranges from -1 (opposite meaning) to 1 (identical meaning).

cosine_sim(A, B) = (A · B) / (‖A‖ × ‖B‖)→ result ∈ [-1, 1]

Vector Space Search Visualizer

This 2D scatter plot represents document embeddings in vector space. Select a query to see which documents are semantically closest — just like a real vector database search.

Search Results

Select a query to see the top-3 most similar documents.

Exercise: Understanding Vector Spaces

Which pair of phrases will have the highest Cosine Similarity score in an embedding vector space?

Phrase A: "The stock market went up today." & Phrase B: "I like baking chocolate cookies."
Phrase A: "Neural networks process vector data." & Phrase B: "Deep learning models calculate tensor matrices."
Phrase A: "Python is a programming language." & Phrase B: "Apples are delicious fruits."

Now that you understand vector databases and prompting, it's time to build a custom chatbot project using these concepts!