Retrieval-Augmented Generation (RAG) is a powerful technique that combines retrieval-based search with a generative AI model. It allows an AI assistant to fetch relevant information from a knowledge base before generating a response, making it more accurate and informative. In this post, we'll walk through building a simple RAG pipeline using LangChain and Hugging Face models.
What is RAG?
RAG improves upon standard AI models by retrieving relevant documents from a database before generating a response. This helps the model provide more contextually accurate answers, especially when dealing with specialized knowledge.
Key Components of a RAG System
Document Storage: A collection of text documents containing useful information.
Text Splitting: Divides large documents into smaller, manageable chunks.
Embedding Model: Converts text chunks into numerical vectors for efficient retrieval.
Vector Database: Stores embeddings and allows similarity searches.
Retriever: Fetches the most relevant document chunk for a query.
LLM (Large Language Model): Generates answers using the retrieved context.
Step-by-Step Implementation
Step 1: Install Required Libraries
Before we begin, ensure you have the required Python packages installed:
pip install langchain langchain_huggingface faiss-cpu
Step 2: Load and Prepare Documents
We start by defining a set of documents containing useful information:
documents = [" blah blah blah blah blah blah. blah blah blahblah blah blah. Vitamin C helps boost immunity.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.","blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah. Exercise improves mental and physical health. blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.","blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah. Drinking enough water keeps you hydrated and improves focus. blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah.blah blah blah blah blah blah. blah blah blahblah blah blah."]
Since documents are usually long, we need to split them into smaller chunks.
Step 3: Convert Documents into Chunks
from langchain.text_splitter import CharacterTextSplittertext_splitter = CharacterTextSplitter(chunk_size=60, chunk_overlap=10, separator=".")chunks = text_splitter.create_documents(documents)
This ensures that each chunk is of a manageable size while maintaining some overlap for context.
Step 4: Create Embeddings and Store in Vector Database
We now convert these text chunks into embeddings and store them in FAISS (a fast similarity search database).
The Hugging Face embedding model converts each chunk into a vector representation, making it searchable.
Step 5: Set Up the Chat Model and Retriever
The retriever will find the most relevant document chunk for each query.
Step 6: Define the RAG Chain
We define a prompt template to instruct the LLM on how to use the retrieved context.
from langchain.chains import create_retrieval_chainfrom langchain.chains.combine_documents import create_stuff_documents_chainfrom langchain_core.prompts import PromptTemplateprompt = PromptTemplate.from_template("You are a helpful AI assistant. Based on the following retrieved context, answer the question concisely.\n\n""Context:\n{context}\n\n""Question: {input}\n""Answer:")stuff_chain = create_stuff_documents_chain(llm, prompt)rag_chain = create_retrieval_chain(retriever, stuff_chain)
Here, we use create_stuff_documents_chain to format the retrieved document before passing it to the LLM.
Step 7: Query and Get a Response
query = "How does exercise affect health?"response = rag_chain.invoke({"input": query})print("Final Answer:", response["answer"])
Now, when you ask a question, the retriever fetches the most relevant document chunk, and the LLM generates an informed answer based on that context.
![]() |
Debugging: What is Being Sent to the LLM?
To inspect what is being sent to the LLM, we can print the final prompt:
def debug_rag_chain(input_query):retrieved_docs = retriever.get_relevant_documents(input_query)retrieved_text = "\n".join([doc.page_content for doc in retrieved_docs])formatted_prompt = prompt.format(context=retrieved_text, input=input_query)print("\n===== DEBUG: FINAL PROMPT SENT TO LLM =====\n")print(formatted_prompt)print("\n==========================================\n")# Test Debuggingdebug_rag_chain("How does exercise affect health?")
This helps us understand how the retrieved documents influence the final answer.
Conclusion
By following these steps, we've successfully built a simple RAG (Retrieval-Augmented Generation) system using LangChain and Hugging Face. This approach allows us to retrieve relevant information before generating an answer, leading to more accurate and informed responses.
I put this code on Github here.
Key Takeaways:
✅ RAG improves AI-generated responses by retrieving relevant documents.
✅ We used FAISS as our vector store for efficient document retrieval.
✅ We limited retrieval to the most relevant document chunk for better accuracy.
✅ Debugging helps understand what the LLM is processing.
AI Course | Bundle Offer (including RAG ebook) | RAG Kindle Book | Master RAG
Contact me (rajamanickam.a@gmail.com) for AI development or one-on-one coaching to learn AI (RAG, Computer Vision, etc) for affordable hourly charges.
No comments:
Post a Comment