LlamaIndex Tutorial: Building RAG Applications

Last updated: June 7 2025

Introduction

LlamaIndex is a powerful data framework that connects Large Language Models (LLMs) with your private data through Retrieval-Augmented Generation (RAG). This tutorial covers the essential concepts for building intelligent applications that can query your documents.

Key Components

Documents: A fundamental container for representing data and its associated metadata. It acts as a wrapper around various data sources, including text, PDFs, API outputs, and database records
Data Connectors: Tools to ingest data from various sources.
LlamaHub: A registry of open-source data connectors that you can easily plug into any LlamaIndex application (+ Agent Tools, and Llama Packs)
Nodes: A "chunk" of a source Document, whether that is a text chunk, an image, or other. Similar to Documents, they contain metadata and relationship information with other nodes
Index: A data structure that allows us to quickly retrieve relevant context for a user query
Query Engine: Interface for asking questions
Response Synthesizer: A component that takes a user query and a set of retrieved nodes (or text chunks) and generates a final response.

1. Setup and Installation

# Uncomment to install required packages
# !pip install llama-index openai python-dotenv -q

import os
from dotenv import load_dotenv
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Document,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Load environment variables from .env file
load_dotenv()

# Load API keys from .env file
openai_api_key = os.getenv("OPENAI_API_KEY")

# Verify API key is loaded
if openai_api_key:
    print("✅ OpenAI API key loaded from .env file")
    os.environ["OPENAI_API_KEY"] = openai_api_key
else:
    print("❌ OpenAI API key not found in .env file")
    print("Please add OPENAI_API_KEY=your-actual-key to your .env file")

✅ OpenAI API key loaded from .env file

# Configure global settings
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

2. Loading Documents

# Method 1: Create sample documents
sample_text = """
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework supports various data sources including PDFs, databases, and APIs.
RAG is the core technique that allows LLMs to answer questions about your data.
"""

documents = [Document(text=sample_text)]

# Method 2: Load from directory (uncomment to use)
# !mkdir -p data
# documents = SimpleDirectoryReader("data").load_data()

print(f"Loaded {len(documents)} documents")
print(f"First document preview: {documents[0].text[:100]}...")

Loaded 1 documents
First document preview: 
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structu...

3. Document Chunking (Node Parsing)

Chunking splits documents into smaller pieces for better retrieval. Key considerations:

Chunk Size: Smaller chunks = more precise, larger chunks = more context
Overlap: Prevents information loss at boundaries
Default: 1024 tokens with 20 token overlap

# Configure text splitter
text_splitter = SentenceSplitter(
    chunk_size=512,  # Smaller chunks for demo
    chunk_overlap=50
)

# Parse documents into nodes
nodes = text_splitter.get_nodes_from_documents(documents)

print(f"Created {len(nodes)} nodes")
print(f"First node: {nodes[0].text}")
print(f"Node metadata: {nodes[0].metadata}")

Created 1 nodes
First node: LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework supports various data sources including PDFs, databases, and APIs.
RAG is the core technique that allows LLMs to answer questions about your data.
Node metadata: {}

4. Creating Vector Index

VectorStoreIndex converts text into embeddings (numerical representations) for semantic search.

# Method 1: Direct from documents
index = VectorStoreIndex.from_documents(documents, show_progress=True)

# Method 2: From nodes (more control)
# index = VectorStoreIndex(nodes, show_progress=True)

print("Vector index created successfully!")

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Vector index created successfully!

5. Building Query Engine

Query Engine handles the retrieval and response generation process.

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=2,  # Retrieve top 2 most similar chunks
    streaming=False
)

# Test query
response = query_engine.query("What is LlamaIndex?")
print("Answer:", response)
print("\nSource nodes:")
for node in response.source_nodes:
    print(f"- Score: {node.score:.3f}")
    print(f"  Text: {node.text[:100]}...\n")

Answer: LlamaIndex is a data framework for building LLM applications that provides tools for ingesting, structuring, and accessing private data for LLMs.

Source nodes:
- Score: 0.894
  Text: LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structur...

6. Advanced: Custom Retrieval

Fine-tune retrieval for better results.

# Get retriever for more control
retriever = index.as_retriever(
    similarity_top_k=3,
    # filters=MetadataFilters(...) # Add metadata filters if needed
)

# Test retrieval
retrieved_nodes = retriever.retrieve("How does RAG work?")
print(f"Retrieved {len(retrieved_nodes)} nodes:")
for i, node in enumerate(retrieved_nodes):
    print(f"\nNode {i+1} (Score: {node.score:.3f}):")
    print(node.text[:150] + "...")

Retrieved 1 nodes:

Node 1 (Score: 0.794):
LlamaIndex is a data framework for building LLM applications. 
It provides tools to ingest, structure, and access private data for LLMs.
The framework...

7. Persistence and Storage

Save your index to avoid reprocessing documents.

# Save index
index.storage_context.persist(persist_dir="./storage")
print("Index saved to ./storage")

# Load index (for future sessions)
from llama_index.core import StorageContext, load_index_from_storage

# storage_context = StorageContext.from_defaults(persist_dir="./storage")
# loaded_index = load_index_from_storage(storage_context)
# query_engine = loaded_index.as_query_engine()

Index saved to ./storage

8. Interactive Demo

# Interactive query function
def ask_question(question):
    response = query_engine.query(question)
    print(f"Question: {question}")
    print(f"Answer: {response}")
    print("-" * 50)

# Test different questions
questions = [
    "What is LlamaIndex?",
    "How does RAG work?",
    "What data sources does LlamaIndex support?"
]

for q in questions:
    ask_question(q)

Question: What is LlamaIndex?
Answer: LlamaIndex is a data framework designed for creating LLM applications, offering tools for managing private data from different sources like PDFs, databases, and APIs. It utilizes the RAG technique to enable LLMs to analyze and respond to inquiries about the data.
--------------------------------------------------
Question: How does RAG work?
Answer: RAG works as the core technique that enables LLMs to respond to inquiries regarding the data by utilizing the tools provided within the LlamaIndex framework.
--------------------------------------------------
Question: What data sources does LlamaIndex support?
Answer: LlamaIndex supports various data sources including PDFs, databases, and APIs.
--------------------------------------------------

Key Takeaways

Best Practices

Chunk Size: Start with 1024 tokens, adjust based on your data
Embeddings: Choose models suited to your domain
Retrieval: Experiment with similarity_top_k values
Persistence: Always save indexes for production use

Next Steps

Explore different node parsers (Semantic, Hierarchical)
Try hybrid search combining semantic and keyword search
Implement metadata filtering for precise retrieval
Build chat engines for conversational interfaces
Use agents for multi-step reasoning

Performance Tips

Use smaller chunks for precise answers
Use larger chunks for comprehensive context
Consider vector databases (Pinecone, Chroma) for scaling
Implement evaluation metrics to measure quality

LlamaIndex makes building RAG applications straightforward while providing flexibility for advanced use cases. Start simple and gradually add complexity as needed!