LlamaIndex Tutorial: Building RAG Applications

Introduction

LlamaIndex is a powerful data framework that connects Large Language Models (LLMs) with your private data through Retrieval-Augmented Generation (RAG). This tutorial covers the essential concepts for building intelligent applications that can query your documents.

What is RAG?

RAG solves the problem that LLMs aren't trained on your specific data. It works by:

  1. Loading your documents
  2. Indexing them into searchable vectors
  3. Retrieving relevant chunks for user queries
  4. Generating responses using retrieved context

Key Components

  • Documents: Your data (PDFs, text files, web pages)
  • Nodes: Chunks of documents for processing
  • Index: Searchable structure (usually vector-based)
  • Query Engine: Interface for asking questions
  • Embeddings: Numerical representations of text meaning

1. Setup and Installation

# Uncomment to install required packages
# !pip install llama-index openai python-dotenv -q
import os
from dotenv import load_dotenv
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Document,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Load environment variables from .env file
load_dotenv()

# Load API keys from .env file
openai_api_key = os.getenv("OPENAI_API_KEY")

# Verify API key is loaded
if openai_api_key:
    print("โœ… OpenAI API key loaded from .env file")
    os.environ["OPENAI_API_KEY"] = openai_api_key
else:
    print("โŒ OpenAI API key not found in .env file")
    print("Please add OPENAI_API_KEY=your-actual-key to your .env file")
โœ… OpenAI API key loaded from .env file
# Configure global settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

2. Loading Documents

# Method 1: Create sample documents
sample_text = """
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework supports various data sources including PDFs, databases, and APIs.
RAG is the core technique that allows LLMs to answer questions about your data.
"""

documents = [Document(text=sample_text)]

# Method 2: Load from directory (uncomment to use)
# !mkdir -p data
# documents = SimpleDirectoryReader("data").load_data()

print(f"Loaded {len(documents)} documents")
print(f"First document preview: {documents[0].text[:100]}...")
Loaded 1 documents
First document preview: 
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structu...

3. Document Chunking (Node Parsing)

Chunking splits documents into smaller pieces for better retrieval. Key considerations:

  • Chunk Size: Smaller chunks = more precise, larger chunks = more context
  • Overlap: Prevents information loss at boundaries
  • Default: 1024 tokens with 20 token overlap
# Configure text splitter
text_splitter = SentenceSplitter(
    chunk_size=512,  # Smaller chunks for demo
    chunk_overlap=50
)

# Parse documents into nodes
nodes = text_splitter.get_nodes_from_documents(documents)

print(f"Created {len(nodes)} nodes")
print(f"First node: {nodes[0].text}")
print(f"Node metadata: {nodes[0].metadata}")
Created 1 nodes
First node: LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework supports various data sources including PDFs, databases, and APIs.
RAG is the core technique that allows LLMs to answer questions about your data.
Node metadata: {}

4. Creating Vector Index

VectorStoreIndex converts text into embeddings (numerical representations) for semantic search.

# Method 1: Direct from documents
index = VectorStoreIndex.from_documents(documents, show_progress=True)

# Method 2: From nodes (more control)
# index = VectorStoreIndex(nodes, show_progress=True)

print("Vector index created successfully!")
Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]
Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]
Vector index created successfully!

5. Building Query Engine

Query Engine handles the retrieval and response generation process.

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=2,  # Retrieve top 2 most similar chunks
    streaming=False
)

# Test query
response = query_engine.query("What is LlamaIndex?")
print("Answer:", response)
print("\nSource nodes:")
for node in response.source_nodes:
    print(f"- Score: {node.score:.3f}")
    print(f"  Text: {node.text[:100]}...\n")
Answer: LlamaIndex is a data framework for building LLM applications that provides tools for ingesting, structuring, and accessing private data for LLMs.

Source nodes:
- Score: 0.894
  Text: LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structur...

6. Advanced: Custom Retrieval

Fine-tune retrieval for better results.

# Get retriever for more control
retriever = index.as_retriever(
    similarity_top_k=3,
    # filters=MetadataFilters(...) # Add metadata filters if needed
)

# Test retrieval
retrieved_nodes = retriever.retrieve("How does RAG work?")
print(f"Retrieved {len(retrieved_nodes)} nodes:")
for i, node in enumerate(retrieved_nodes):
    print(f"\nNode {i+1} (Score: {node.score:.3f}):")
    print(node.text[:150] + "...")
Retrieved 1 nodes:

Node 1 (Score: 0.794):
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework...

7. Persistence and Storage

Save your index to avoid reprocessing documents.

# Save index
index.storage_context.persist(persist_dir="./storage")
print("Index saved to ./storage")

# Load index (for future sessions)
from llama_index.core import StorageContext, load_index_from_storage

# storage_context = StorageContext.from_defaults(persist_dir="./storage")
# loaded_index = load_index_from_storage(storage_context)
# query_engine = loaded_index.as_query_engine()
Index saved to ./storage

8. Interactive Demo

# Interactive query function
def ask_question(question):
    response = query_engine.query(question)
    print(f"Question: {question}")
    print(f"Answer: {response}")
    print("-" * 50)

# Test different questions
questions = [
    "What is LlamaIndex?",
    "How does RAG work?",
    "What data sources does LlamaIndex support?"
]

for q in questions:
    ask_question(q)
Question: What is LlamaIndex?
Answer: LlamaIndex is a data framework designed for creating LLM applications, offering tools for managing private data from different sources like PDFs, databases, and APIs. It utilizes the RAG technique to enable LLMs to analyze and respond to inquiries about the data.
--------------------------------------------------
Question: How does RAG work?
Answer: RAG works as the core technique that enables LLMs to respond to inquiries regarding the data by utilizing the tools provided within the LlamaIndex framework.
--------------------------------------------------
Question: What data sources does LlamaIndex support?
Answer: LlamaIndex supports various data sources including PDFs, databases, and APIs.
--------------------------------------------------

Key Takeaways

Best Practices

  1. Chunk Size: Start with 1024 tokens, adjust based on your data
  2. Embeddings: Choose models suited to your domain
  3. Retrieval: Experiment with similarity_top_k values
  4. Persistence: Always save indexes for production use

Next Steps

  • Explore different node parsers (Semantic, Hierarchical)
  • Try hybrid search combining semantic and keyword search
  • Implement metadata filtering for precise retrieval
  • Build chat engines for conversational interfaces
  • Use agents for multi-step reasoning

Performance Tips

  • Use smaller chunks for precise answers
  • Use larger chunks for comprehensive context
  • Consider vector databases (Pinecone, Chroma) for scaling
  • Implement evaluation metrics to measure quality

LlamaIndex makes building RAG applications straightforward while providing flexibility for advanced use cases. Start simple and gradually add complexity as needed!