LiteLLM Tutorial: Unified API for 100+ LLMs

Last updated: July 20 2025

LiteLLM is a Python library that provides a unified interface for calling 100+ Language Model APIs using the OpenAI format. It simplifies integration across providers like OpenAI, Anthropic, Google, Azure, AWS Bedrock, and more.

Key Features

Unified API: Same interface for all providers
Load Balancing: Router with retry/fallback logic
Cost Tracking: Built-in spend monitoring
Streaming Support: Real-time response streaming
Error Handling: Consistent exception handling
Async Support: Full async/await compatibility

1. Installation & Setup

# Uncomment to install LiteLLM and python-dotenv with compatible versions
#!pip install "litellm>=1.0.0" "python-dotenv>=1.0.0" "pydantic>=2.0.0,<3.0.0" -q

# Import required modules
import litellm
from litellm import completion, Router
import os
import asyncio
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Load API keys from .env file and create explicit variables
openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
huggingface_api_key = os.getenv("HUGGINGFACE_API_KEY")

2. Basic Completion Calls

LiteLLM uses the same completion() function for all providers. Just change the model name:

# OpenAI GPT-4
if openai_api_key:
    response = completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello, how are you?"}]
    )
    print("OpenAI:", response.choices[0].message.content)
else:
    print("❌ OpenAI API key not found")

OpenAI: Hello! I'm just a program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?

3. Multi-Provider Examples

LiteLLM supports 100+ providers with consistent formatting:

from IPython.display import display, Markdown
import warnings

# Suppress Pydantic warnings for cleaner output
warnings.filterwarnings("ignore", category=UserWarning, module="pydantic")
warnings.filterwarnings("ignore", message=".*Expected.*fields but got.*")

user_message = [{"role": "user", "content": "Exlaine what is AI in 30 words"}]

# Initialize results list
results = []
results.append("### 🔄 Testing Multiple LLM Providers\n")

# OpenAI GPT-4
def get_openai_result():
    if openai_api_key:
        try:
            response = completion(
                model="gpt-4o-mini",
                messages=user_message
            )
            return f"#### 🟢 OpenAI GPT-4o-mini\n**Response:** {response.choices[0].message.content}\n"
        except Exception as e:
            return f"#### 🔴 OpenAI Error\n**Error Type:** {type(e).__name__}\n**Details:** {str(e)}\n"
    else:
        return "#### 🟡 OpenAI\n**Status:** API key not found\n"
results.append(get_openai_result())

# Anthropic Claude
def get_claude_result():
    if anthropic_api_key:
        try:
            response = completion(
                model="claude-3-haiku-20240307",
                messages=user_message
            )
            return f"#### 🟢 Anthropic Claude-3-Haiku\n**Response:** {response.choices[0].message.content}\n"
        except Exception as e:
            return f"#### 🔴 Claude Error\n**Error Type:** {type(e).__name__}\n**Details:** {str(e)}\n"
    else:
        return "## 🟡 Anthropic Claude\n**Status:** API key not found\n"
results.append(get_claude_result())

# HuggingFace
def get_hf_result():
    if huggingface_api_key:
        try:
            hf_response = completion(
                model="huggingface/HuggingFaceTB/SmolLM3-3B",
                messages=user_message,
                api_key=huggingface_api_key
            )
            return f"#### 🟢 HuggingFace SmolLM3-3B\n**Response:** {hf_response.choices[0].message.content}\n"
        except Exception as e:
            return f"#### 🔴 HuggingFace Error\n**Error Type:** {type(e).__name__}\n**Details:** {str(e)}\n"
    else:
        return "#### 🟡 HuggingFace\n**Status:** API key not found\n"
results.append(get_hf_result())

# Local Ollama (no API key needed)
def get_ollama_result():
    try:
        ollama_response = completion(
            model="ollama/gemma:2b",
            messages=user_message,
            api_base="http://localhost:11434"
        )
        return f"#### 🟢 Ollama (Local) - Gemma 2B\n**Response:** {ollama_response.choices[0].message.content}\n"
    except Exception as e:
        return f"#### 🔴 Ollama (Local)\n**Status:** Not available\n**Details:** {str(e)}\n"
results.append(get_ollama_result())

# Display all results as markdown
markdown_output = "\n".join(results)
display(Markdown(markdown_output))

🔄 Testing Multiple LLM Providers

🟢 OpenAI GPT-4o-mini

Response: Artificial Intelligence (AI) is the simulation of human intelligence in machines, enabling them to perform tasks like learning, reasoning, problem-solving, and understanding language, aimed at mimicking cognitive functions.

🟢 Anthropic Claude-3-Haiku

Response: AI (Artificial Intelligence) is the simulation of human intelligence in machines, capable of performing tasks that typically require human intelligence, such as learning, problem-solving, and decision-making.

🟢 HuggingFace SmolLM3-3B

Response: Okay, the user wants an explanation of AI in 30 words. Let me start by defining what AI is. It's short for Artificial Intelligence. I need to mention that it's the simulation of human intelligence in machines. But how to condense that into 30 words?

First, the key points: AI is a field of computer science, involves machines mimicking human thought processes. Applications like language translation, decision-making, problem-solving. Maybe mention machine learning and neural networks as technologies used in AI.

Wait, the user might not need the technical jargon. So keep it simple. Maybe start with "Artificial Intelligence (AI) is technology that enables machines to perform tasks requiring human-like intelligence." Then add applications: "like speech recognition, decision-making, and problem-solving." But that's 10 words. Need to expand to 30.

How about: "Artificial Intelligence (AI) is technology that allows machines to think and learn like humans, using algorithms and data to process information, recognize patterns, and make decisions. It powers applications in healthcare, finance, and customer service." That's 23 words. Still need 7 more. Maybe add something about machine learning and neural networks. "AI uses machine learning and neural networks to analyze data, improve performance over time, and automate complex tasks." Now it's 30 words. Let me check the count again. "Artificial Intelligence (AI) is technology that allows machines to think and learn like humans, using algorithms and data to process information, recognize patterns, and make decisions. It powers applications in healthcare, finance, and customer service, utilizing machine learning and neural networks to analyze data, improve performance over time, and automate complex tasks." Yes, that's 30 words. It covers the core aspects: definition, how it works, applications, and key technologies. I think that works.

Artificial Intelligence (AI) is technology that enables machines to think and learn like humans, using algorithms and data to process information, recognize patterns, and make decisions. It powers applications in healthcare, finance, and customer service, utilizing machine learning and neural networks to analyze data, improve performance over time, and automate complex tasks. (30 words)

🟢 Ollama (Local) - Gemma 2B

Response: Artificial Intelligence (AI) is a computer system that can perform tasks that require human intelligence, such as learning, problem-solving, and decision-making.

4. Streaming Support

Get real-time streaming responses by setting stream=True:

# Streaming response
response = completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Write a short poem about AI within 20 words"}],
    stream=True,
    max_tokens=30
)

print("Streaming response:")
for chunk in response:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print("\n")

Streaming response:
AI,, a marvel a marvel of modern of modern day,
 day,
EndlessEndless possibilities in possibilities in its way. its way.

5. Router: Load Balancing & Fallbacks

The Router enables load balancing, retries, and fallbacks across multiple deployments:

# Configure multiple model deployments
model_list = [
    {
        "model_name": "gpt-3.5-turbo",
        "litellm_params": {
            "model": "azure/gpt-35-turbo",
            "api_key": os.getenv("AZURE_API_KEY"),
            "api_base": os.getenv("AZURE_API_BASE"),
            "rpm": 100  # requests per minute
        }
    },
    {
        "model_name": "gpt-3.5-turbo",
        "litellm_params": {
            "model": "gpt-3.5-turbo",
            "api_key": os.getenv("OPENAI_API_KEY"),
            "rpm": 200
        }
    }
]

# Create router with fallbacks
router = Router(
    model_list=model_list,
    num_retries=3,
    timeout=30,
    fallbacks=[{"gpt-3.5-turbo": ["gpt-4o-mini"]}]  # fallback to GPT-4o-mini if needed
)

# Make request through router
response = router.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello from router!"}]
)
print("Router response:", response.choices[0].message.content)

Router response: Hello, how can I assist you today?

6. Error Handling & Retries

LiteLLM standardizes error handling across all providers and supports automatic retries:

from openai import OpenAIError

try:
    # This will retry 3 times on failure
    response = completion(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Test message"}],
        num_retries=3
    )
    print("Success:", response.choices[0].message.content)
    
except OpenAIError as e:
    print(f"LiteLLM Error: {e}")
except Exception as e:
    print(f"General Error: {e}")

General Error: name 'completion' is not defined

7. Async Support

LiteLLM supports async/await for concurrent operations:

import asyncio
import warnings
from litellm import acompletion

# Suppress Pydantic warnings for cleaner output
warnings.filterwarnings("ignore", category=UserWarning, module="pydantic")

async def async_completion_example():
    """Example of async completion"""
    response = await acompletion(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello async world!"}]
    )
    return response.choices[0].message.content

async def multiple_async_calls():
    """Make multiple concurrent API calls"""
    tasks = [
        acompletion(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": f"What is {topic}?"}]
        )
        for topic in ["AI", "Machine Learning", "Deep Learning"]
    ]
    
    responses = await asyncio.gather(*tasks)
    for i, response in enumerate(responses):
        print(f"Response {i+1}: {response.choices[0].message.content[:50]}...")

# Fix for Jupyter notebooks - use await instead of asyncio.run()
print("Running async example...")
result = await async_completion_example()
print("Async result:", result)

print("\nRunning multiple async calls...")
await multiple_async_calls()

Running async example...
Async result: Hello! How can I assist you today?

Running multiple async calls...
Async result: Hello! How can I assist you today?

Running multiple async calls...
Response 1: AI, which stands for artificial intelligence, refe...
Response 2: Machine learning is a subset of artificial intelli...
Response 3: Deep learning is a subset of machine learning that...
Response 1: AI, which stands for artificial intelligence, refe...
Response 2: Machine learning is a subset of artificial intelli...
Response 3: Deep learning is a subset of machine learning that...

Summary

LiteLLM simplifies LLM integration by providing:

Unified API - Same interface for 100+ providers
Reliability - Built-in retries, fallbacks, and load balancing
Observability - Cost tracking, logging, and monitoring
Performance - Streaming and async support
Production-ready - Error handling and advanced routing