Zeabur AI Hub: Unified API Gateway for All AI Models

Last updated: November 23 2025

1. What is Zeabur AI Hub?

Zeabur AI Hub is a unified AI model access platform that provides developers with a single API to access multiple leading AI models including Claude, GPT, Gemini, DeepSeek, and more. Instead of managing separate API keys and accounts for different providers, you get one API key to access all models through a standardized OpenAI-compatible interface.

Zeabur AI Hub

Key Features

Unified API: Single API key for all models (Claude, GPT, Gemini, DeepSeek, Llama, etc.)
OpenAI-Compatible: Works with existing OpenAI SDK code - no learning curve
Global Endpoints: Multiple regional endpoints for optimal latency
Competitive Pricing: Pay-as-you-go with transparent pricing
Advanced Features: Function calling, structured outputs, streaming, vision, and more
Model Flexibility: Switch models instantly by changing one parameter

Prerequisites

Before starting, you'll need:

A Zeabur AI Hub API key (get it from zeabur.com/ai-hub)
Python 3.7+ installed
Basic knowledge of Python and API usage

Let's get started! 🚀

2. Setup and Installation

First, install the required packages and set up your environment variables.

# Install required packages
!pip install openai python-dotenv requests pillow -q

Create a .env file in your project directory with the following format:

ZEABUR_API_KEY=sk-your-api-key-here

import os
from dotenv import load_dotenv
from openai import OpenAI
import json

# Load environment variables from .env file
load_dotenv()

# Get API key from environment
ZEABUR_API_KEY = os.getenv("ZEABUR_API_KEY")

if not ZEABUR_API_KEY:
    raise ValueError("ZEABUR_API_KEY not found in .env file")

# Available endpoints
ENDPOINTS = {
    "tokyo": "https://hnd1.aihub.zeabur.ai/",
    "san_francisco": "https://sfo1.aihub.zeabur.ai/"
}

# Initialize client (using Tokyo endpoint by default)
client = OpenAI(
    base_url=ENDPOINTS["tokyo"],
    api_key=ZEABUR_API_KEY
)

print("✅ Setup complete! Client initialized with Tokyo endpoint.")

✅ Setup complete! Client initialized with Tokyo endpoint.

3. Basic Chat Completions

Zeabur AI Hub supports multiple models. Here are the available models:

Claude Models:

claude-sonnet-4-5: Advanced reasoning and analysis
claude-haiku-4-5: Fast and efficient responses

GPT Models:

gpt-5: Latest OpenAI model
gpt-5-mini: Smaller, faster version
gpt-4.1, gpt-4.1-mini: Previous generation
gpt-4o, gpt-4o-mini: Optimized variants

Gemini Models:

gemini-2.5-pro, gemini-2.5-flash: Google's multimodal models

Other Models:

deepseek-v3.2-exp: DeepSeek's efficient models
llama-3.3-70b: Meta's open-source model
qwen-3-32: Alibaba's reasoning model
kimi-k2-thinking: Moonshot AI's thinking model

# Basic non-streaming completion
def simple_chat(model="claude-haiku-4-5", message="Hello! Tell me about AI in 20 words."):
    """Send a simple chat message and get a complete response."""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": message}],
        max_tokens=500
    )
    return response.choices[0].message.content

# Test with Claude Haiku
print("Using Claude Haiku 4.5:")
print("-" * 50)
result = simple_chat(model="claude-haiku-4-5")
print(result)
print("\n")

# Test with GPT-4o
print("Using GPT-4o:")
print("-" * 50)
result = simple_chat(model="gpt-4o")
print(result)

Using Claude Haiku 4.5:
--------------------------------------------------
AI enables machines to learn from data and perform tasks intelligently, mimicking human cognition through algorithms and neural networks.


Using GPT-4o:
--------------------------------------------------
AI, or Artificial Intelligence, is the development of computer systems to perform tasks requiring human-like intelligence, such as learning, reasoning, and problem-solving.

4. Streaming Responses

Streaming allows you to receive AI responses token-by-token as they're generated, providing a better user experience for interactive applications.

def streaming_chat(model="claude-haiku-4-5", message="Tell me a short story about a robot in 30 words."):
    """Stream chat responses token by token."""
    stream = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": message}],
        stream=True,
        max_tokens=300
    )
    
    print(f"Streaming response from {model}:")
    print("-" * 50)
    
    full_response = ""
    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        if content:
            print(content, end='', flush=True)
            full_response += content
    
    print("\n" + "-" * 50)
    return full_response

# Test streaming
response = streaming_chat(
    model="claude-sonnet-4-5"
)

# Test streaming
response = streaming_chat(
    model="gpt-4o"
)

Streaming response from claude-sonnet-4-5:
--------------------------------------------------
The lonely robot tended Earth's last garden for centuries after humans left. One day, a seedling sprouted. "Welcome," it whispered to the tiny plant, "I'm not alone anymore."
--------------------------------------------------
Streaming response from gpt-4o:
--------------------------------------------------
A lonely robot discovered a flower in the wasteland. Carefully, it nurtured it, learning love. When the flower bloomed, the robot felt alive, no longer just a machine.
--------------------------------------------------

5. Model Comparison and Selection

Different models have different strengths. Let's compare models across multiple providers to see their performance and response quality.

Key Features of Our Comparison:

Provider Mapping: Uses a clean dictionary to identify providers by model prefix
Thinking Tag Removal: Automatically strips <think> and <thinking> tags from reasoning models
Performance Metrics: Tracks response time and token usage for each model
Error Handling: Gracefully handles model errors and incompatibilities

We'll test 7 models from 7 different providers: Anthropic, OpenAI, xAI, DeepSeek, Meta, Alibaba, and Moonshot AI.

import time
import re

# Provider mapping
PROVIDERS = {
    "claude": "Anthropic",
    "gpt": "OpenAI",
    "gemini": "Google",
    "grok": "xAI",
    "deepseek": "DeepSeek",
    "llama": "Meta",
    "qwen": "Alibaba",
    "kimi": "Moonshot AI"
}

def get_provider_name(model):
    """Get provider name based on model prefix."""
    for prefix, provider in PROVIDERS.items():
        if model.startswith(prefix):
            return provider
    return "Unknown"

def clean_response(content):
    """Remove thinking tags from model responses."""
    if not content:
        return content
    
    original = content
    
    # Remove thinking content for tags: <think>, <thinking>
    for tag in ['think', 'thinking']:
        # Extract content after closing tag if it exists
        closing = f'</{tag}>'
        if closing in content.lower():
            parts = re.split(closing, content, flags=re.IGNORECASE | re.DOTALL)
            content = parts[-1].strip()
        
        # Remove from opening tag onwards if no closing tag
        opening = f'<{tag}>'
        if opening in content.lower():
            content = re.split(opening, content, flags=re.IGNORECASE | re.DOTALL)[0].strip()
    
    return content if content.strip() else original

def compare_models(prompt, models, max_tokens_map=None):
    """Compare response quality and speed across different models."""
    max_tokens_map = max_tokens_map or {}
    results = []
    
    for model in models:
        print(f"Testing {model}...")
        start_time = time.time()
        
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=max_tokens_map.get(model, 200)
            )
            
            content = clean_response(response.choices[0].message.content)
            
            results.append({
                "model": model,
                "provider": get_provider_name(model),
                "response": content or "[Empty response]",
                "time_seconds": round(time.time() - start_time, 2),
                "tokens": getattr(response.usage, 'total_tokens', "N/A")
            })
            
        except Exception as e:
            results.append({
                "model": model,
                "provider": get_provider_name(model),
                "response": f"Error: {str(e)[:100]}",
                "time_seconds": 0,
                "tokens": 0
            })
    
    return results

# Test configuration
test_prompt = "Tell me a joke in 20 words."

models_to_test = [
    "claude-haiku-4-5",
    "gpt-4o-mini",
    "grok-4-fast-non-reasoning",
    "deepseek-v3.2-exp",
    "llama-3.3-70b",
    "qwen-3-32",
    "kimi-k2-thinking"
]

# Thinking models need more tokens
max_tokens_map = {
    "kimi-k2-thinking": 800,
    "qwen-3-32": 800,
}

# Run comparison
print("Comparing Models Across Providers")
print("=" * 80)
print(f"Prompt: {test_prompt}\n")

results = compare_models(test_prompt, models_to_test, max_tokens_map)

# Display results
for r in results:
    print(f"\n{'=' * 80}")
    print(f"Provider: {r['provider']} | Model: {r['model']}")
    print(f"Time: {r['time_seconds']}s | Tokens: {r['tokens']}")
    print(f"Response: {r['response'][:500]}")

# Summary
print(f"\n{'=' * 80}")
print("SUMMARY")
print("=" * 80)

successful = [r for r in results if not r['response'].startswith('Error') and not r['response'].startswith('[')]

if successful:
    print(f"Successful: {len(successful)}/{len(results)}")
    print(f"Average time: {sum(r['time_seconds'] for r in successful) / len(successful):.2f}s")
    
    fastest = min(successful, key=lambda x: x['time_seconds'])
    slowest = max(successful, key=lambda x: x['time_seconds'])
    
    print(f"Fastest: {fastest['model']} ({fastest['provider']}) - {fastest['time_seconds']}s")
    print(f"Slowest: {slowest['model']} ({slowest['provider']}) - {slowest['time_seconds']}s")
    
    # Provider breakdown
    print(f"\nProviders tested:")
    providers = sorted(set(r['provider'] for r in successful))
    for provider in providers:
        provider_models = [r['model'] for r in successful if r['provider'] == provider]
        print(f"  - {provider}: {', '.join(provider_models)}")
else:
    print("No successful tests.")

Comparing Models Across Providers
================================================================================
Prompt: Tell me a joke in 20 words.

Testing claude-haiku-4-5...
Testing gpt-4o-mini...
Testing grok-4-fast-non-reasoning...
Testing deepseek-v3.2-exp...
Testing llama-3.3-70b...
Testing qwen-3-32...
Testing kimi-k2-thinking...

================================================================================
Provider: Anthropic | Model: claude-haiku-4-5
Time: 1.31s | Tokens: 38
Response: Why did the scarecrow win an award? Because he was outstanding in his field!

================================================================================
Provider: OpenAI | Model: gpt-4o-mini
Time: 0.09s | Tokens: 42
Response: Why did the scarecrow win an award? Because he was outstanding in his field, keeping crows away while standing still!

================================================================================
Provider: xAI | Model: grok-4-fast-non-reasoning
Time: 0.49s | Tokens: 197
Response: Why did the scarecrow win an award? He was outstanding in his field! (9 words)

================================================================================
Provider: DeepSeek | Model: deepseek-v3.2-exp
Time: 0.11s | Tokens: 31
Response: Why did the chicken cross the playground? To get to the other slide.

================================================================================
Provider: Meta | Model: llama-3.3-70b
Time: 0.51s | Tokens: 65
Response: Why was the math book sad? Because it had too many problems to solve every single day always.

================================================================================
Provider: Alibaba | Model: qwen-3-32
Time: 0.62s | Tokens: 341
Response: 

"Why did the duck leave the pond? Too many quackers around, it said. Swam to a quieter lake." (20 words)

================================================================================
Provider: Moonshot AI | Model: kimi-k2-thinking
Time: 13.2s | Tokens: 791
Response:  A man walked into a library asking for a book on Pavlov's dogs. The librarian said, "It rings a bell."

================================================================================
SUMMARY
================================================================================
Successful: 7/7
Average time: 2.33s
Fastest: gpt-4o-mini (OpenAI) - 0.09s
Slowest: kimi-k2-thinking (Moonshot AI) - 13.2s

Providers tested:
  - Alibaba: qwen-3-32
  - Anthropic: claude-haiku-4-5
  - DeepSeek: deepseek-v3.2-exp
  - Meta: llama-3.3-70b
  - Moonshot AI: kimi-k2-thinking
  - OpenAI: gpt-4o-mini
  - xAI: grok-4-fast-non-reasoning

Summary and Next Steps

You've learned the key concepts of Zeabur AI Hub:

Setup: Loading API keys from .env and initializing the OpenAI-compatible client
Basic Completions: Making simple chat requests across different models
Streaming: Receiving real-time token-by-token responses for better UX
Model Comparison: Testing multiple providers with automated performance tracking and thinking tag removal

Key Takeaways:

Zeabur AI Hub provides a unified API for 7+ AI providers
Switch models by changing a single parameter - no code changes needed
Thinking models (Qwen, Kimi) use <think> tags that can be automatically filtered
Performance varies significantly: GPT-4o-mini is fastest (0.09s), Kimi-k2 is slowest (13.2s)

🎁 Beta Tester Promotion

Zeabur AI Hub is in Beta! Join now and get $10 credits when you register with an invite code. Each tester receives 3 invite codes to share—invite 3 friends and earn another $10 credit!

Get your invite code: Visit Discord, Twitter/X, or Threads, mention @zeaburapp to claim your code!

Additional Resources

Zeabur AI Hub: https://zeabur.com/ai-hub
Model Catalog: https://zeabur.com/models
Discord Community: https://zeabur.com/dc
GitHub Examples: https://github.com/zeabur/ai-hub-examples