LiteLLM Integration

LiteLLM provides a unified interface to call 100+ language models using the OpenAI format. AnyAPI integrates seamlessly with LiteLLM, allowing you to access all AnyAPI models through LiteLLM’s standardized interface.

Overview

LiteLLM simplifies working with multiple LLM providers by:
  • Unified API format - Use OpenAI’s format for all models
  • Automatic retries - Built-in retry logic and error handling
  • Cost tracking - Monitor usage and costs across providers
  • Fallback support - Automatically failover between models
  • Streaming support - Real-time response streaming

Easy Setup

Single installation, unified interface

Cost Tracking

Built-in usage and cost monitoring

Error Handling

Automatic retries and fallbacks

Streaming

Real-time response streaming

Installation

Install LiteLLM via pip:
pip install litellm
For additional features like cost tracking and logging:
pip install 'litellm[proxy]'

Quick Start

Basic Usage

from litellm import completion
import os

# Set your AnyAPI key
os.environ["ANYAPI_API_KEY"] = "your-anyapi-key"

# Call any AnyAPI model using LiteLLM
response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    api_base="https://api.anyapi.ai/v1"
)

print(response.choices[0].message.content)

Environment Configuration

Set up environment variables for automatic configuration:
# .env file
ANYAPI_API_KEY=your-anyapi-key
ANYAPI_API_BASE=https://api.anyapi.ai/v1
from litellm import completion
import os
from dotenv import load_dotenv

load_dotenv()

response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

Model Configuration

Available Models

Access all AnyAPI models through LiteLLM by prefixing with anyapi/:
# OpenAI models
response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic models  
response = completion(
    model="anyapi/claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Hello"}]
)

# Google models
response = completion(
    model="anyapi/gemini-pro",
    messages=[{"role": "user", "content": "Hello"}]
)

# Open source models
response = completion(
    model="anyapi/llama-2-70b",
    messages=[{"role": "user", "content": "Hello"}]
)

Custom Configuration

Configure model-specific parameters:
from litellm import completion

# Configure specific model parameters
response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    temperature=0.8,
    max_tokens=1000,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1
)

Advanced Features

Streaming Responses

Stream responses in real-time:
from litellm import completion

def stream_response():
    response = completion(
        model="anyapi/gpt-4o",
        messages=[{"role": "user", "content": "Write a long story about AI"}],
        stream=True
    )
    
    for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

stream_response()

Async Support

Use LiteLLM with async/await:
import asyncio
from litellm import acompletion

async def async_completion():
    response = await acompletion(
        model="anyapi/gpt-4o",
        messages=[{"role": "user", "content": "What is quantum computing?"}]
    )
    return response.choices[0].message.content

# Run async function
result = asyncio.run(async_completion())
print(result)

Batch Processing

Process multiple requests efficiently:
import asyncio
from litellm import acompletion

async def batch_process(questions):
    tasks = []
    
    for question in questions:
        task = acompletion(
            model="anyapi/gpt-4o",
            messages=[{"role": "user", "content": question}]
        )
        tasks.append(task)
    
    responses = await asyncio.gather(*tasks)
    return [r.choices[0].message.content for r in responses]

# Example usage
questions = [
    "What is machine learning?",
    "Explain quantum computing",
    "What is blockchain technology?"
]

answers = asyncio.run(batch_process(questions))
for q, a in zip(questions, answers):
    print(f"Q: {q}")
    print(f"A: {a}\n")

Error Handling and Retries

Automatic Retries

LiteLLM includes built-in retry logic:
from litellm import completion

# Automatic retries on failure
response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    num_retries=3,  # Retry up to 3 times
    timeout=30      # 30 second timeout
)

Custom Error Handling

Implement custom error handling:
from litellm import completion
from litellm.exceptions import APIError, RateLimitError, Timeout
import time

def robust_completion(model, messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = completion(
                model=model,
                messages=messages,
                timeout=30
            )
            return response
            
        except RateLimitError:
            print(f"Rate limit hit, waiting {2**attempt} seconds...")
            time.sleep(2**attempt)
            
        except Timeout:
            print(f"Request timeout on attempt {attempt + 1}")
            
        except APIError as e:
            print(f"API error: {e}")
            if attempt == max_retries - 1:
                raise
                
        except Exception as e:
            print(f"Unexpected error: {e}")
            if attempt == max_retries - 1:
                raise
    
    raise Exception("Max retries exceeded")

# Usage
response = robust_completion(
    "anyapi/gpt-4o",
    [{"role": "user", "content": "Hello"}]
)

Fallback Configuration

Model Fallbacks

Configure automatic fallbacks between models:
from litellm import completion

def completion_with_fallback(messages, models=None):
    if models is None:
        models = [
            "anyapi/gpt-4o",
            "anyapi/claude-3-5-sonnet", 
            "anyapi/gemini-pro"
        ]
    
    for model in models:
        try:
            response = completion(
                model=model,
                messages=messages,
                timeout=30
            )
            print(f"✅ Success with {model}")
            return response
            
        except Exception as e:
            print(f"❌ Failed with {model}: {e}")
            continue
    
    raise Exception("All models failed")

# Usage
response = completion_with_fallback([
    {"role": "user", "content": "Explain artificial intelligence"}
])

Load Balancing

Distribute requests across multiple models:
import random
from litellm import completion

class LoadBalancer:
    def __init__(self, models):
        self.models = models
        self.model_weights = {model: 1.0 for model in models}
    
    def get_model(self):
        # Weighted random selection
        models = list(self.model_weights.keys())
        weights = list(self.model_weights.values())
        return random.choices(models, weights=weights)[0]
    
    def update_weight(self, model, success):
        # Increase weight on success, decrease on failure
        if success:
            self.model_weights[model] = min(2.0, self.model_weights[model] * 1.1)
        else:
            self.model_weights[model] = max(0.1, self.model_weights[model] * 0.9)
    
    def completion(self, messages, **kwargs):
        model = self.get_model()
        
        try:
            response = completion(
                model=model,
                messages=messages,
                **kwargs
            )
            self.update_weight(model, True)
            return response
            
        except Exception as e:
            self.update_weight(model, False)
            raise e

# Usage
balancer = LoadBalancer([
    "anyapi/gpt-4o",
    "anyapi/claude-3-5-sonnet",
    "anyapi/gemini-pro"
])

response = balancer.completion([
    {"role": "user", "content": "What is the future of AI?"}
])

Cost Tracking

Usage Monitoring

Track costs and usage across models:
from litellm import completion, cost_per_token
import json

class CostTracker:
    def __init__(self):
        self.usage_log = []
    
    def tracked_completion(self, model, messages, **kwargs):
        response = completion(
            model=model,
            messages=messages,
            **kwargs
        )
        
        # Calculate cost
        usage = response.usage
        cost = cost_per_token(
            model=model,
            prompt_tokens=usage.prompt_tokens,
            completion_tokens=usage.completion_tokens
        )
        
        # Log usage
        self.usage_log.append({
            "model": model,
            "prompt_tokens": usage.prompt_tokens,
            "completion_tokens": usage.completion_tokens,
            "total_tokens": usage.total_tokens,
            "cost": cost,
            "timestamp": time.time()
        })
        
        return response
    
    def get_total_cost(self):
        return sum(entry["cost"] for entry in self.usage_log)
    
    def get_usage_summary(self):
        summary = {}
        for entry in self.usage_log:
            model = entry["model"]
            if model not in summary:
                summary[model] = {
                    "requests": 0,
                    "tokens": 0,
                    "cost": 0
                }
            
            summary[model]["requests"] += 1
            summary[model]["tokens"] += entry["total_tokens"]
            summary[model]["cost"] += entry["cost"]
        
        return summary

# Usage
tracker = CostTracker()

response = tracker.tracked_completion(
    "anyapi/gpt-4o",
    [{"role": "user", "content": "Explain machine learning"}]
)

print(f"Total cost: ${tracker.get_total_cost():.4f}")
print("Usage summary:", json.dumps(tracker.get_usage_summary(), indent=2))

LiteLLM Proxy Server

Setup Proxy Server

Run LiteLLM as a proxy server for team usage:
# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: anyapi/gpt-4o
      api_base: https://api.anyapi.ai/v1
      api_key: os.environ/ANYAPI_API_KEY
      
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: anyapi/claude-3-5-sonnet
      api_base: https://api.anyapi.ai/v1
      api_key: os.environ/ANYAPI_API_KEY

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  set_verbose: True
Start the proxy server:
export ANYAPI_API_KEY="your-api-key"
litellm --config config.yaml --port 8000

Using the Proxy

Connect to the proxy server:
from openai import OpenAI

# Connect to LiteLLM proxy
client = OpenAI(
    api_key="anything",  # Proxy handles auth
    base_url="http://localhost:8000"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Model name from config
    messages=[{"role": "user", "content": "Hello proxy!"}]
)

print(response.choices[0].message.content)

Function Calling

Using Functions with AnyAPI Models

from litellm import completion

# Define functions
functions = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
]

# Call with function
response = completion(
    model="anyapi/gpt-4o",
    messages=[
        {"role": "user", "content": "What's the weather like in New York?"}
    ],
    functions=functions,
    function_call="auto"
)

# Check if function was called
if response.choices[0].message.function_call:
    function_call = response.choices[0].message.function_call
    print(f"Function called: {function_call.name}")
    print(f"Arguments: {function_call.arguments}")

Best Practices

Configuration Management

Use configuration files for complex setups:
# config.py
ANYAPI_CONFIG = {
    "api_key": "your-api-key",
    "api_base": "https://api.anyapi.ai/v1",
    "default_model": "anyapi/gpt-4o",
    "timeout": 30,
    "max_retries": 3,
    "temperature": 0.7
}

# main.py
from litellm import completion
from config import ANYAPI_CONFIG

def smart_completion(messages, **kwargs):
    # Merge config with kwargs
    params = {**ANYAPI_CONFIG, **kwargs}
    
    return completion(
        model=params.pop("default_model"),
        messages=messages,
        **params
    )

Performance Optimization

Optimize for speed and efficiency:
import asyncio
from litellm import acompletion

class OptimizedClient:
    def __init__(self, max_concurrent=10):
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.session_cache = {}
    
    async def optimized_completion(self, model, messages, **kwargs):
        async with self.semaphore:
            return await acompletion(
                model=model,
                messages=messages,
                **kwargs
            )
    
    async def batch_with_limit(self, requests):
        tasks = [
            self.optimized_completion(**req) 
            for req in requests
        ]
        return await asyncio.gather(*tasks)

# Usage
client = OptimizedClient(max_concurrent=5)

requests = [
    {
        "model": "anyapi/gpt-4o",
        "messages": [{"role": "user", "content": f"Question {i}"}]
    }
    for i in range(20)
]

responses = asyncio.run(client.batch_with_limit(requests))

Troubleshooting

Common Issues

Authentication Errors

Error: Invalid API key
Solution: Verify your AnyAPI key is set correctly:
echo $ANYAPI_API_KEY

Model Not Found

Error: Model 'gpt-4o' not found
Solution: Use the full model path:
# Correct
model="anyapi/gpt-4o"

# Incorrect  
model="gpt-4o"

Connection Issues

Error: Connection timeout
Solution: Check your network connection and increase timeout:
response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    timeout=60  # Increase timeout
)

Debug Mode

Enable debug logging:
import litellm
import logging

# Enable debug mode
litellm.set_verbose = True
logging.basicConfig(level=logging.DEBUG)

response = completion(
    model="anyapi/gpt-4o",
    messages=[{"role": "user", "content": "Debug test"}]
)

Health Check

Test your configuration:
def test_anyapi_connection():
    try:
        response = completion(
            model="anyapi/gpt-4o-mini",
            messages=[{"role": "user", "content": "ping"}],
            max_tokens=1
        )
        print("✅ Connection successful")
        return True
    except Exception as e:
        print(f"❌ Connection failed: {e}")
        return False

test_anyapi_connection()

Next Steps

For more information about LiteLLM, visit the official documentation.