Zeabur AI Hub: Unified API Gateway for All AI Models
1. What is Zeabur AI Hub?
Zeabur AI Hub is a unified AI model access platform that provides developers with a single API to access multiple leading AI models including Claude, GPT, Gemini, DeepSeek, and more. Instead of managing separate API keys and accounts for different providers, you get one API key to access all models through a standardized OpenAI-compatible interface.

Key Features
- Unified API: Single API key for all models (Claude, GPT, Gemini, DeepSeek, Llama, etc.)
- OpenAI-Compatible: Works with existing OpenAI SDK code - no learning curve
- Global Endpoints: Multiple regional endpoints for optimal latency
- Competitive Pricing: Pay-as-you-go with transparent pricing
- Advanced Features: Function calling, structured outputs, streaming, vision, and more
- Model Flexibility: Switch models instantly by changing one parameter
Prerequisites
Before starting, you'll need:
- A Zeabur AI Hub API key (get it from zeabur.com/ai-hub)
- Python 3.7+ installed
- Basic knowledge of Python and API usage
Let's get started! đ
2. Setup and Installation
First, install the required packages and set up your environment variables.
# Install required packages
!pip install openai python-dotenv requests pillow -qCreate a .env file in your project directory with the following format:
ZEABUR_API_KEY=sk-your-api-key-here
import os
from dotenv import load_dotenv
from openai import OpenAI
import json
# Load environment variables from .env file
load_dotenv()
# Get API key from environment
ZEABUR_API_KEY = os.getenv("ZEABUR_API_KEY")
if not ZEABUR_API_KEY:
raise ValueError("ZEABUR_API_KEY not found in .env file")
# Available endpoints
ENDPOINTS = {
"tokyo": "https://hnd1.aihub.zeabur.ai/",
"san_francisco": "https://sfo1.aihub.zeabur.ai/"
}
# Initialize client (using Tokyo endpoint by default)
client = OpenAI(
base_url=ENDPOINTS["tokyo"],
api_key=ZEABUR_API_KEY
)
print("â
Setup complete! Client initialized with Tokyo endpoint.")â Setup complete! Client initialized with Tokyo endpoint.
3. Basic Chat Completions
Zeabur AI Hub supports multiple models. Here are the available models:
Claude Models:
claude-sonnet-4-5: Advanced reasoning and analysisclaude-haiku-4-5: Fast and efficient responses
GPT Models:
gpt-5: Latest OpenAI modelgpt-5-mini: Smaller, faster versiongpt-4.1,gpt-4.1-mini: Previous generationgpt-4o,gpt-4o-mini: Optimized variants
Gemini Models:
gemini-2.5-pro,gemini-2.5-flash: Google's multimodal models
Other Models:
deepseek-v3.2-exp: DeepSeek's efficient modelsllama-3.3-70b: Meta's open-source modelqwen-3-32: Alibaba's reasoning modelkimi-k2-thinking: Moonshot AI's thinking model
# Basic non-streaming completion
def simple_chat(model="claude-haiku-4-5", message="Hello! Tell me about AI in 20 words."):
"""Send a simple chat message and get a complete response."""
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": message}],
max_tokens=500
)
return response.choices[0].message.content
# Test with Claude Haiku
print("Using Claude Haiku 4.5:")
print("-" * 50)
result = simple_chat(model="claude-haiku-4-5")
print(result)
print("\n")
# Test with GPT-4o
print("Using GPT-4o:")
print("-" * 50)
result = simple_chat(model="gpt-4o")
print(result)Using Claude Haiku 4.5: -------------------------------------------------- AI enables machines to learn from data and perform tasks intelligently, mimicking human cognition through algorithms and neural networks. Using GPT-4o: -------------------------------------------------- AI, or Artificial Intelligence, is the development of computer systems to perform tasks requiring human-like intelligence, such as learning, reasoning, and problem-solving.
4. Streaming Responses
Streaming allows you to receive AI responses token-by-token as they're generated, providing a better user experience for interactive applications.
def streaming_chat(model="claude-haiku-4-5", message="Tell me a short story about a robot in 30 words."):
"""Stream chat responses token by token."""
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": message}],
stream=True,
max_tokens=300
)
print(f"Streaming response from {model}:")
print("-" * 50)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content or ''
if content:
print(content, end='', flush=True)
full_response += content
print("\n" + "-" * 50)
return full_response
# Test streaming
response = streaming_chat(
model="claude-sonnet-4-5"
)
# Test streaming
response = streaming_chat(
model="gpt-4o"
)Streaming response from claude-sonnet-4-5: -------------------------------------------------- The lonely robot tended Earth's last garden for centuries after humans left. One day, a seedling sprouted. "Welcome," it whispered to the tiny plant, "I'm not alone anymore." -------------------------------------------------- Streaming response from gpt-4o: -------------------------------------------------- A lonely robot discovered a flower in the wasteland. Carefully, it nurtured it, learning love. When the flower bloomed, the robot felt alive, no longer just a machine. --------------------------------------------------
5. Model Comparison and Selection
Different models have different strengths. Let's compare models across multiple providers to see their performance and response quality.
Key Features of Our Comparison:
- Provider Mapping: Uses a clean dictionary to identify providers by model prefix
- Thinking Tag Removal: Automatically strips
<think>and<thinking>tags from reasoning models - Performance Metrics: Tracks response time and token usage for each model
- Error Handling: Gracefully handles model errors and incompatibilities
We'll test 7 models from 7 different providers: Anthropic, OpenAI, xAI, DeepSeek, Meta, Alibaba, and Moonshot AI.
import time
import re
# Provider mapping
PROVIDERS = {
"claude": "Anthropic",
"gpt": "OpenAI",
"gemini": "Google",
"grok": "xAI",
"deepseek": "DeepSeek",
"llama": "Meta",
"qwen": "Alibaba",
"kimi": "Moonshot AI"
}
def get_provider_name(model):
"""Get provider name based on model prefix."""
for prefix, provider in PROVIDERS.items():
if model.startswith(prefix):
return provider
return "Unknown"
def clean_response(content):
"""Remove thinking tags from model responses."""
if not content:
return content
original = content
# Remove thinking content for tags: <think>, <thinking>
for tag in ['think', 'thinking']:
# Extract content after closing tag if it exists
closing = f'</{tag}>'
if closing in content.lower():
parts = re.split(closing, content, flags=re.IGNORECASE | re.DOTALL)
content = parts[-1].strip()
# Remove from opening tag onwards if no closing tag
opening = f'<{tag}>'
if opening in content.lower():
content = re.split(opening, content, flags=re.IGNORECASE | re.DOTALL)[0].strip()
return content if content.strip() else original
def compare_models(prompt, models, max_tokens_map=None):
"""Compare response quality and speed across different models."""
max_tokens_map = max_tokens_map or {}
results = []
for model in models:
print(f"Testing {model}...")
start_time = time.time()
try:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens_map.get(model, 200)
)
content = clean_response(response.choices[0].message.content)
results.append({
"model": model,
"provider": get_provider_name(model),
"response": content or "[Empty response]",
"time_seconds": round(time.time() - start_time, 2),
"tokens": getattr(response.usage, 'total_tokens', "N/A")
})
except Exception as e:
results.append({
"model": model,
"provider": get_provider_name(model),
"response": f"Error: {str(e)[:100]}",
"time_seconds": 0,
"tokens": 0
})
return results
# Test configuration
test_prompt = "Tell me a joke in 20 words."
models_to_test = [
"claude-haiku-4-5",
"gpt-4o-mini",
"grok-4-fast-non-reasoning",
"deepseek-v3.2-exp",
"llama-3.3-70b",
"qwen-3-32",
"kimi-k2-thinking"
]
# Thinking models need more tokens
max_tokens_map = {
"kimi-k2-thinking": 800,
"qwen-3-32": 800,
}
# Run comparison
print("Comparing Models Across Providers")
print("=" * 80)
print(f"Prompt: {test_prompt}\n")
results = compare_models(test_prompt, models_to_test, max_tokens_map)
# Display results
for r in results:
print(f"\n{'=' * 80}")
print(f"Provider: {r['provider']} | Model: {r['model']}")
print(f"Time: {r['time_seconds']}s | Tokens: {r['tokens']}")
print(f"Response: {r['response'][:500]}")
# Summary
print(f"\n{'=' * 80}")
print("SUMMARY")
print("=" * 80)
successful = [r for r in results if not r['response'].startswith('Error') and not r['response'].startswith('[')]
if successful:
print(f"Successful: {len(successful)}/{len(results)}")
print(f"Average time: {sum(r['time_seconds'] for r in successful) / len(successful):.2f}s")
fastest = min(successful, key=lambda x: x['time_seconds'])
slowest = max(successful, key=lambda x: x['time_seconds'])
print(f"Fastest: {fastest['model']} ({fastest['provider']}) - {fastest['time_seconds']}s")
print(f"Slowest: {slowest['model']} ({slowest['provider']}) - {slowest['time_seconds']}s")
# Provider breakdown
print(f"\nProviders tested:")
providers = sorted(set(r['provider'] for r in successful))
for provider in providers:
provider_models = [r['model'] for r in successful if r['provider'] == provider]
print(f" - {provider}: {', '.join(provider_models)}")
else:
print("No successful tests.")Comparing Models Across Providers ================================================================================ Prompt: Tell me a joke in 20 words. Testing claude-haiku-4-5... Testing gpt-4o-mini... Testing grok-4-fast-non-reasoning... Testing deepseek-v3.2-exp... Testing llama-3.3-70b... Testing qwen-3-32... Testing kimi-k2-thinking... ================================================================================ Provider: Anthropic | Model: claude-haiku-4-5 Time: 1.31s | Tokens: 38 Response: Why did the scarecrow win an award? Because he was outstanding in his field! ================================================================================ Provider: OpenAI | Model: gpt-4o-mini Time: 0.09s | Tokens: 42 Response: Why did the scarecrow win an award? Because he was outstanding in his field, keeping crows away while standing still! ================================================================================ Provider: xAI | Model: grok-4-fast-non-reasoning Time: 0.49s | Tokens: 197 Response: Why did the scarecrow win an award? He was outstanding in his field! (9 words) ================================================================================ Provider: DeepSeek | Model: deepseek-v3.2-exp Time: 0.11s | Tokens: 31 Response: Why did the chicken cross the playground? To get to the other slide. ================================================================================ Provider: Meta | Model: llama-3.3-70b Time: 0.51s | Tokens: 65 Response: Why was the math book sad? Because it had too many problems to solve every single day always. ================================================================================ Provider: Alibaba | Model: qwen-3-32 Time: 0.62s | Tokens: 341 Response: "Why did the duck leave the pond? Too many quackers around, it said. Swam to a quieter lake." (20 words) ================================================================================ Provider: Moonshot AI | Model: kimi-k2-thinking Time: 13.2s | Tokens: 791 Response: A man walked into a library asking for a book on Pavlov's dogs. The librarian said, "It rings a bell." ================================================================================ SUMMARY ================================================================================ Successful: 7/7 Average time: 2.33s Fastest: gpt-4o-mini (OpenAI) - 0.09s Slowest: kimi-k2-thinking (Moonshot AI) - 13.2s Providers tested: - Alibaba: qwen-3-32 - Anthropic: claude-haiku-4-5 - DeepSeek: deepseek-v3.2-exp - Meta: llama-3.3-70b - Moonshot AI: kimi-k2-thinking - OpenAI: gpt-4o-mini - xAI: grok-4-fast-non-reasoning
Summary and Next Steps
You've learned the key concepts of Zeabur AI Hub:
- Setup: Loading API keys from
.envand initializing the OpenAI-compatible client - Basic Completions: Making simple chat requests across different models
- Streaming: Receiving real-time token-by-token responses for better UX
- Model Comparison: Testing multiple providers with automated performance tracking and thinking tag removal
Key Takeaways:
- Zeabur AI Hub provides a unified API for 7+ AI providers
- Switch models by changing a single parameter - no code changes needed
- Thinking models (Qwen, Kimi) use
<think>tags that can be automatically filtered - Performance varies significantly: GPT-4o-mini is fastest (0.09s), Kimi-k2 is slowest (13.2s)
đ Beta Tester Promotion
Zeabur AI Hub is in Beta! Join now and get $10 credits when you register with an invite code. Each tester receives 3 invite codes to shareâinvite 3 friends and earn another $10 credit!
Get your invite code: Visit Discord, Twitter/X, or Threads, mention @zeaburapp to claim your code!
Additional Resources
- Zeabur AI Hub: https://zeabur.com/ai-hub
- Model Catalog: https://zeabur.com/models
- Discord Community: https://zeabur.com/dc
- GitHub Examples: https://github.com/zeabur/ai-hub-examples