Latency & Performance

Lightning-fast AI requests without the complexity Speed matters. When your users are waiting for AI responses, every millisecond counts. That’s why we built AnyAPI with performance as our obsession, not an afterthought.

The Speed You Actually Get

~40ms of added latency – that’s it. We’re talking about the time between when your request hits our servers and when we forward it to your chosen AI provider. For context, that’s faster than you can blink. Here’s how we keep it blazing fast: ⚡ Edge-first architecture – We run on Cloudflare Workers worldwide, so we’re always close to your users
🧠 Smart caching – User data and API keys are cached at the edge for instant access
🎯 Optimized routing – Our request processing is streamlined to the essentials

What Affects Performance

Cold Start Delays

The “first request” phenomenon When we haven’t seen traffic in a particular region for a while (typically 1-2 minutes), the first few requests might take a bit longer as our edge caches warm up. Think of it like starting a car on a cold morning – it needs a moment to get going. What this means for you: The first request to a new region might add an extra 50-100ms while we get our caches populated. After that? Smooth sailing.

Credit Balance Checks

When your account needs attention We keep a close eye on your spending to make sure you never get surprised by bills or service interruptions. But when your balance gets low (think single-digit dollars), we need to check your account status more frequently. Performance impact:

💰 Healthy balance ($10+): Zero impact on speed
⚠️ Low balance (<$5): Occasional extra 10-20ms for balance verification
🚨 Critical balance (<$1): More frequent checks until you top up

Pro tip: Set up auto-topup with a $10-20 minimum. Your speed stays consistent, and you never think about credits again.

Model Fallback Scenarios

When Plan A doesn’t work out Sometimes AI providers have hiccups – it’s just the nature of the beast. When your primary model fails, we automatically try your next configured option. This failover protection keeps your app running, but that initial failure does add some latency to that specific request. What happens:

Request goes to primary provider → fails (adds ~2-5 seconds)
We instantly retry with backup provider → succeeds
Your app gets the response (with some delay, but it works)

Performance Optimization Playbook

💰 Keep Your Balance Healthy

The #1 thing you can do for consistent speed

# Set up auto-topup to never worry about balance again
def setup_auto_topup(threshold_dollars=20, topup_amount=50):
    """Configure automatic credit replenishment"""
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    payload = {
        "auto_topup_enabled": True,
        "threshold_amount": threshold_dollars * 100,  # Convert to credits
        "topup_amount": topup_amount * 100,
        "payment_method": "default"  # Use your default payment method
    }
    
    response = requests.post(
        "https://api.anyapi.ai/api/v1/billing/auto-topup",
        headers=headers,
        json=payload
    )
    
    return response.json()

# ✅ Set it once, forget about it forever
setup_result = setup_auto_topup(threshold_dollars=20, topup_amount=100)
print(f"Auto-topup configured: {setup_result['success']}")

Why this works: No more low-balance checks = consistent ~40ms latency every single time.

🌍 Use Provider Preferences Strategically

Route to the fastest providers for your use case

# Configure providers by latency priority
payload = {
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Quick response needed!"}],
    
    # 🚀 Speed-optimized provider routing
    "provider_preferences": {
        "primary": "openai",      # Usually fastest for GPT models
        "fallback": "azure-openai"  # Reliable backup
    },
    
    # ⚡ Request timeout settings
    "timeout": 30,  # Fail fast instead of hanging
    "retry_policy": {
        "max_retries": 2,
        "backoff_factor": 1.5
    }
}

Regional performance tips:

US East: OpenAI direct is typically fastest
Europe: Azure OpenAI often wins on latency
Asia-Pacific: Test both and see what works for your traffic patterns

🎯 Smart Request Patterns

How you send requests matters

import time
import asyncio
import aiohttp

class HighPerformanceAPIClient:
    def __init__(self, api_key, region="auto"):
        self.api_key = api_key
        self.base_url = "https://api.anyapi.ai/api/v1"
        self.session = None
        self.region = region
        
    async def __aenter__(self):
        # 🚀 Reuse connections for better performance
        connector = aiohttp.TCPConnector(
            limit=100,              # Connection pool size
            limit_per_host=10,      # Per-host limit
            keepalive_timeout=30    # Keep connections alive
        )
        
        self.session = aiohttp.ClientSession(
            connector=connector,
            timeout=aiohttp.ClientTimeout(total=60)
        )
        return self
    
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self.session:
            await self.session.close()
    
    async def fast_completion(self, messages, model="gpt-4o", **kwargs):
        """Optimized for minimal latency"""
        
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
            "X-AnyAPI-Region-Preference": self.region,  # Stick to one region
            "X-AnyAPI-Cache-Preference": "speed"        # Prefer speed over cost
        }
        
        payload = {
            "model": model,
            "messages": messages,
            "stream": kwargs.get("stream", False),
            **kwargs
        }
        
        start_time = time.time()
        
        async with self.session.post(
            f"{self.base_url}/chat/completions",
            headers=headers,
            json=payload
        ) as response:
            result = await response.json()
            latency = (time.time() - start_time) * 1000
            
            # 📊 Track your performance
            print(f"⚡ Request completed in {latency:.0f}ms")
            return result

# 🏃‍♂️ Usage for speed demons
async def main():
    async with HighPerformanceAPIClient(API_KEY, region="us-east") as client:
        
        # Multiple concurrent requests for maximum throughput
        tasks = []
        for i in range(5):
            task = client.fast_completion([
                {"role": "user", "content": f"Quick task #{i+1}"}
            ])
            tasks.append(task)
        
        # Execute all requests concurrently
        results = await asyncio.gather(*tasks)
        print(f"✅ Completed {len(results)} requests concurrently")

# Run it
asyncio.run(main())

💾 Smart Caching Strategies

Don’t repeat expensive work

import hashlib
import json
from functools import lru_cache

class CachedAPIClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.response_cache = {}
        
    def _generate_cache_key(self, messages, model, **kwargs):
        """Create consistent cache keys"""
        cache_data = {
            "messages": messages,
            "model": model,
            "temperature": kwargs.get("temperature", 1.0),
            "max_tokens": kwargs.get("max_tokens")
        }
        
        cache_string = json.dumps(cache_data, sort_keys=True)
        return hashlib.md5(cache_string.encode()).hexdigest()
    
    async def cached_completion(self, messages, model="gpt-4o", 
                               cache_ttl_minutes=5, **kwargs):
        """Lightning-fast responses for repeated requests"""
        
        cache_key = self._generate_cache_key(messages, model, **kwargs)
        
        # 🚀 Cache hit = instant response
        if cache_key in self.response_cache:
            cached_data = self.response_cache[cache_key]
            if time.time() - cached_data["timestamp"] < cache_ttl_minutes * 60:
                print("⚡ Cache hit! Instant response")
                return cached_data["response"]
        
        # 📡 Cache miss = API request + cache for next time
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
        
        payload = {"model": model, "messages": messages, **kwargs}
        
        async with aiohttp.ClientSession() as session:
            async with session.post(
                "https://api.anyapi.ai/api/v1/chat/completions",
                headers=headers,
                json=payload
            ) as response:
                result = await response.json()
                
                # 💾 Cache for future requests
                self.response_cache[cache_key] = {
                    "response": result,
                    "timestamp": time.time()
                }
                
                print("📡 Fresh API response cached")
                return result

# Perfect for FAQ bots, repeated analyses, etc.
client = CachedAPIClient(API_KEY)

Performance Monitoring

Track Your Real-World Latency

Because you can’t optimize what you don’t measure

import time
from collections import defaultdict

class PerformanceTracker:
    def __init__(self):
        self.metrics = defaultdict(list)
    
    def track_request(self, operation_name):
        """Context manager for tracking request performance"""
        return RequestTimer(self.metrics, operation_name)
    
    def get_performance_report(self):
        """Generate performance insights"""
        report = {}
        
        for operation, times in self.metrics.items():
            if times:
                report[operation] = {
                    "requests": len(times),
                    "avg_latency_ms": sum(times) / len(times),
                    "p50_latency_ms": sorted(times)[len(times)//2],
                    "p95_latency_ms": sorted(times)[int(len(times)*0.95)],
                    "max_latency_ms": max(times)
                }
        
        return report

class RequestTimer:
    def __init__(self, metrics, operation_name):
        self.metrics = metrics
        self.operation_name = operation_name
        self.start_time = None
    
    def __enter__(self):
        self.start_time = time.time()
        return self
    
    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.start_time:
            duration_ms = (time.time() - self.start_time) * 1000
            self.metrics[self.operation_name].append(duration_ms)

# 📊 Usage
tracker = PerformanceTracker()

async def monitored_request(messages):
    with tracker.track_request("chat_completion"):
        # Your API request here
        response = await client.fast_completion(messages)
        return response

# After some requests, see how you're doing
performance_report = tracker.get_performance_report()
print("🎯 Performance Report:")
for operation, stats in performance_report.items():
    print(f"  {operation}:")
    print(f"    📊 {stats['requests']} requests")  
    print(f"    ⚡ {stats['avg_latency_ms']:.0f}ms average")
    print(f"    🎯 {stats['p95_latency_ms']:.0f}ms 95th percentile")

When Speed Really Matters

Real-Time Applications

Sub-100ms total latency targets

# Configuration for real-time apps (chat, live support, etc.)
real_time_config = {
    "model": "gpt-4o-mini",  # Faster model for real-time use
    "max_tokens": 150,       # Shorter responses = faster delivery
    "temperature": 0.7,      # Consistent but not robotic
    "stream": True,          # Start showing results immediately
    
    # 🚀 Performance optimizations
    "provider_preferences": {
        "primary": "openai",
        "fallback": "anthropic"
    },
    "timeout": 15,  # Fail fast for real-time apps
    "priority": "speed"  # Choose speed over cost
}

Batch Processing

When throughput beats individual request speed

async def batch_process_optimized(requests_batch, concurrency=10):
    """Process many requests with optimal throughput"""
    
    semaphore = asyncio.Semaphore(concurrency)
    
    async def process_single_request(request_data):
        async with semaphore:  # Limit concurrent requests
            return await client.fast_completion(**request_data)
    
    # 🚀 Process all requests concurrently (up to limit)
    tasks = [process_single_request(req) for req in requests_batch]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    return results

Troubleshooting Slow Performance

🔍 Common Issues & Quick Fixes

Issue: First request is slow in new regions
Fix: Warm up your caches with a dummy request when deploying Issue: Inconsistent latency throughout the day
Fix: Check your credit balance and set up auto-topup Issue: Requests timing out frequently
Fix: Review your timeout settings and provider preferences Issue: High latency for specific models
Fix: Test different providers for that model family

📊 Performance Debugging

def debug_slow_request():
    """Debug performance issues step by step"""
    
    start_time = time.time()
    
    # 1. Check account status
    print("1️⃣ Checking account status...")
    account_check_start = time.time()
    # Account status check would go here
    print(f"   ✅ Account check: {(time.time() - account_check_start)*1000:.0f}ms")
    
    # 2. Test simple request
    print("2️⃣ Testing basic request...")
    request_start = time.time()
    # Basic API request would go here  
    print(f"   ✅ API request: {(time.time() - request_start)*1000:.0f}ms")
    
    # 3. Check network connectivity
    print("3️⃣ Network diagnostics...")
    # Network checks would go here
    
    total_time = (time.time() - start_time) * 1000
    print(f"🏁 Total debug time: {total_time:.0f}ms")

The Bottom Line

AnyAPI is built for speed. With proper configuration and best practices, you’ll consistently see:

~40ms added latency for most requests
Sub-100ms total response times for simple operations
Predictable performance that scales with your application

Quick wins for better performance:

💰 Keep credits above $10 (set up auto-topup)
🌍 Choose providers strategically for your regions
💾 Cache responses when possible
📊 Monitor performance and optimize continuously

Ready to make your AI app lightning-fast? These optimizations will get you there. ⚡ Speed is a feature – make it yours! 🚀

Get started

Features

Use Cases

Developer guides

API Reference

Integrations

The Speed You Actually Get

What Affects Performance

Cold Start Delays

Credit Balance Checks

Model Fallback Scenarios

Performance Optimization Playbook

💰 Keep Your Balance Healthy

🌍 Use Provider Preferences Strategically

🎯 Smart Request Patterns

💾 Smart Caching Strategies

Performance Monitoring

Track Your Real-World Latency

When Speed Really Matters

Real-Time Applications

Batch Processing

Troubleshooting Slow Performance

🔍 Common Issues & Quick Fixes

📊 Performance Debugging

The Bottom Line

Get started

Features

Use Cases

Developer guides

API Reference

Integrations

​The Speed You Actually Get

​What Affects Performance

​Cold Start Delays

​Credit Balance Checks

​Model Fallback Scenarios

​Performance Optimization Playbook

​💰 Keep Your Balance Healthy

​🌍 Use Provider Preferences Strategically

​🎯 Smart Request Patterns

​💾 Smart Caching Strategies

​Performance Monitoring

​Track Your Real-World Latency

​When Speed Really Matters

​Real-Time Applications

​Batch Processing

​Troubleshooting Slow Performance

​🔍 Common Issues & Quick Fixes

​📊 Performance Debugging

​The Bottom Line

The Speed You Actually Get

What Affects Performance

Cold Start Delays

Credit Balance Checks

Model Fallback Scenarios

Performance Optimization Playbook

💰 Keep Your Balance Healthy

🌍 Use Provider Preferences Strategically

🎯 Smart Request Patterns

💾 Smart Caching Strategies

Performance Monitoring

Track Your Real-World Latency

When Speed Really Matters

Real-Time Applications

Batch Processing

Troubleshooting Slow Performance

🔍 Common Issues & Quick Fixes

📊 Performance Debugging

The Bottom Line