Documentation Index
Fetch the complete documentation index at: https://docs.anyapi.ai/llms.txt
Use this file to discover all available pages before exploring further.
Lightning-fast AI requests without the complexity
Speed matters. When your users are waiting for AI responses, every millisecond counts. That’s why we built AnyAPI with performance as our obsession, not an afterthought.
The Speed You Actually Get
~40ms of added latency – that’s it.
We’re talking about the time between when your request hits our servers and when we forward it to your chosen AI provider. For context, that’s faster than you can blink.
Here’s how we keep it blazing fast:
⚡ Edge-first architecture – We run on Cloudflare Workers worldwide, so we’re always close to your users
🧠 Smart caching – User data and API keys are cached at the edge for instant access
🎯 Optimized routing – Our request processing is streamlined to the essentials
Cold Start Delays
The “first request” phenomenon
When we haven’t seen traffic in a particular region for a while (typically 1-2 minutes), the first few requests might take a bit longer as our edge caches warm up. Think of it like starting a car on a cold morning – it needs a moment to get going.
What this means for you: The first request to a new region might add an extra 50-100ms while we get our caches populated. After that? Smooth sailing.
Model Fallback Scenarios
When Plan A doesn’t work out
Sometimes AI providers have hiccups – it’s just the nature of the beast. When your primary model fails, we automatically try your next configured option. This failover protection keeps your app running, but that initial failure does add some latency to that specific request.
What happens:
- Request goes to primary provider → fails (adds ~2-5 seconds)
- We instantly retry with backup provider → succeeds
- Your app gets the response (with some delay, but it works)
🎯 Smart Request Patterns
How you send requests matters
import time
import asyncio
import aiohttp
class HighPerformanceAPIClient:
def __init__(self, api_key, region="auto"):
self.api_key = api_key
self.base_url = "https://api.anyapi.ai/api/v1"
self.session = None
self.region = region
async def __aenter__(self):
# 🚀 Reuse connections for better performance
connector = aiohttp.TCPConnector(
limit=100, # Connection pool size
limit_per_host=10, # Per-host limit
keepalive_timeout=30 # Keep connections alive
)
self.session = aiohttp.ClientSession(
connector=connector,
timeout=aiohttp.ClientTimeout(total=60)
)
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
if self.session:
await self.session.close()
async def fast_completion(self, messages, model="gpt-4o", **kwargs):
"""Optimized for minimal latency"""
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
"X-AnyAPI-Region-Preference": self.region, # Stick to one region
"X-AnyAPI-Cache-Preference": "speed" # Prefer speed over cost
}
payload = {
"model": model,
"messages": messages,
"stream": kwargs.get("stream", False),
**kwargs
}
start_time = time.time()
async with self.session.post(
f"{self.base_url}/chat/completions",
headers=headers,
json=payload
) as response:
result = await response.json()
latency = (time.time() - start_time) * 1000
# 📊 Track your performance
print(f"⚡ Request completed in {latency:.0f}ms")
return result
# 🏃♂️ Usage for speed demons
async def main():
async with HighPerformanceAPIClient(API_KEY, region="us-east") as client:
# Multiple concurrent requests for maximum throughput
tasks = []
for i in range(5):
task = client.fast_completion([
{"role": "user", "content": f"Quick task #{i+1}"}
])
tasks.append(task)
# Execute all requests concurrently
results = await asyncio.gather(*tasks)
print(f"✅ Completed {len(results)} requests concurrently")
# Run it
asyncio.run(main())
💾 Smart Caching Strategies
Don’t repeat expensive work
import hashlib
import json
from functools import lru_cache
class CachedAPIClient:
def __init__(self, api_key):
self.api_key = api_key
self.response_cache = {}
def _generate_cache_key(self, messages, model, **kwargs):
"""Create consistent cache keys"""
cache_data = {
"messages": messages,
"model": model,
"temperature": kwargs.get("temperature", 1.0),
"max_tokens": kwargs.get("max_tokens")
}
cache_string = json.dumps(cache_data, sort_keys=True)
return hashlib.md5(cache_string.encode()).hexdigest()
async def cached_completion(self, messages, model="gpt-4o",
cache_ttl_minutes=5, **kwargs):
"""Lightning-fast responses for repeated requests"""
cache_key = self._generate_cache_key(messages, model, **kwargs)
# 🚀 Cache hit = instant response
if cache_key in self.response_cache:
cached_data = self.response_cache[cache_key]
if time.time() - cached_data["timestamp"] < cache_ttl_minutes * 60:
print("⚡ Cache hit! Instant response")
return cached_data["response"]
# 📡 Cache miss = API request + cache for next time
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
}
payload = {"model": model, "messages": messages, **kwargs}
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.anyapi.ai/api/v1/chat/completions",
headers=headers,
json=payload
) as response:
result = await response.json()
# 💾 Cache for future requests
self.response_cache[cache_key] = {
"response": result,
"timestamp": time.time()
}
print("📡 Fresh API response cached")
return result
# Perfect for FAQ bots, repeated analyses, etc.
client = CachedAPIClient(API_KEY)
Track Your Real-World Latency
Because you can’t optimize what you don’t measure
import time
from collections import defaultdict
class PerformanceTracker:
def __init__(self):
self.metrics = defaultdict(list)
def track_request(self, operation_name):
"""Context manager for tracking request performance"""
return RequestTimer(self.metrics, operation_name)
def get_performance_report(self):
"""Generate performance insights"""
report = {}
for operation, times in self.metrics.items():
if times:
report[operation] = {
"requests": len(times),
"avg_latency_ms": sum(times) / len(times),
"p50_latency_ms": sorted(times)[len(times)//2],
"p95_latency_ms": sorted(times)[int(len(times)*0.95)],
"max_latency_ms": max(times)
}
return report
class RequestTimer:
def __init__(self, metrics, operation_name):
self.metrics = metrics
self.operation_name = operation_name
self.start_time = None
def __enter__(self):
self.start_time = time.time()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
if self.start_time:
duration_ms = (time.time() - self.start_time) * 1000
self.metrics[self.operation_name].append(duration_ms)
# 📊 Usage
tracker = PerformanceTracker()
async def monitored_request(messages):
with tracker.track_request("chat_completion"):
# Your API request here
response = await client.fast_completion(messages)
return response
# After some requests, see how you're doing
performance_report = tracker.get_performance_report()
print("🎯 Performance Report:")
for operation, stats in performance_report.items():
print(f" {operation}:")
print(f" 📊 {stats['requests']} requests")
print(f" ⚡ {stats['avg_latency_ms']:.0f}ms average")
print(f" 🎯 {stats['p95_latency_ms']:.0f}ms 95th percentile")
When Speed Really Matters
Real-Time Applications
Sub-100ms total latency targets
# Configuration for real-time apps (chat, live support, etc.)
real_time_config = {
"model": "gpt-4o-mini", # Faster model for real-time use
"max_tokens": 150, # Shorter responses = faster delivery
"temperature": 0.7, # Consistent but not robotic
"stream": True, # Start showing results immediately
# 🚀 Performance optimizations
"provider_preferences": {
"primary": "openai",
"fallback": "anthropic"
},
"timeout": 15, # Fail fast for real-time apps
"priority": "speed" # Choose speed over cost
}
Batch Processing
When throughput beats individual request speed
async def batch_process_optimized(requests_batch, concurrency=10):
"""Process many requests with optimal throughput"""
semaphore = asyncio.Semaphore(concurrency)
async def process_single_request(request_data):
async with semaphore: # Limit concurrent requests
return await client.fast_completion(**request_data)
# 🚀 Process all requests concurrently (up to limit)
tasks = [process_single_request(req) for req in requests_batch]
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
🔍 Common Issues & Quick Fixes
Issue: First request is slow in new regions
Fix: Warm up your caches with a dummy request when deploying
Issue: Inconsistent latency throughout the day
Fix: Check your credit balance and set up auto-topup
Issue: High latency for specific models
Fix: Test different providers for that model family
def debug_slow_request():
"""Debug performance issues step by step"""
start_time = time.time()
# 1. Check account status
print("1️⃣ Checking account status...")
account_check_start = time.time()
# Account status check would go here
print(f" ✅ Account check: {(time.time() - account_check_start)*1000:.0f}ms")
# 2. Test simple request
print("2️⃣ Testing basic request...")
request_start = time.time()
# Basic API request would go here
print(f" ✅ API request: {(time.time() - request_start)*1000:.0f}ms")
# 3. Check network connectivity
print("3️⃣ Network diagnostics...")
# Network checks would go here
total_time = (time.time() - start_time) * 1000
print(f"🏁 Total debug time: {total_time:.0f}ms")
The Bottom Line
AnyAPI is built for speed. With proper configuration and best practices, you’ll consistently see:
- ~40ms added latency for most requests
- Sub-100ms total response times for simple operations
- Predictable performance that scales with your application
Quick wins for better performance:
- 🌍 Choose providers strategically for your regions
- 💾 Cache responses when possible
- 📊 Monitor performance and optimize continuously
Ready to make your AI app lightning-fast? These optimizations will get you there. ⚡
Speed is a feature – make it yours! 🚀