Console Output
Run your code to see output here.
Large Language Models are frozen in time when their training ends, and they do not have access to your private files. **Retrieval-Augmented Generation (RAG)** solves this by feeding relevant documents into the prompt at runtime.
The RAG Pipeline
A standard RAG system consists of three key steps:
- Ingestion: Break down large PDF or text files into small paragraphs (chunks) and convert them into numerical representations called **Vector Embeddings**.
- Retrieval: When a user asks a question, convert their query into a vector embedding and search a **Vector Database** to retrieve the document chunks that are mathematically closest to the query.
- Generation: Feed the retrieved chunks as "context" along with the user's question into the LLM, which synthesizes a highly accurate response using the private documents.
Vector Embeddings and Cosine Similarity
An embedding model maps textual meaning to a high-dimensional vector space (e.g. 1536 dimensions). To calculate how similar two text documents are, we measure the angle between their vector coordinates. The mathematical standard for this is **Cosine Similarity**, which ranges from -1 (opposite meaning) to 1 (identical meaning).
Exercise: Understanding Vector Spaces
Which pair of phrases will have the highest Cosine Similarity score in an embedding vector space?
- [ ]Phrase A: "The stock market went up today." & Phrase B: "I like baking chocolate cookies."
- [x]Phrase A: "Neural networks process vector data." & Phrase B: "Deep learning models calculate tensor matrices."
- [ ]Phrase A: "Python is a programming language." & Phrase B: "Apples are delicious fruits."
Now that you understand vector databases and prompting, it's time to build a custom chatbot project using these concepts!