Think of parameters as the control panel for your AI requests. Want creative writing? Crank up the temperature. Need precise code? Turn it way down. Here’s every parameter you can tweak to get exactly the output you’re looking for.

The Must-Haves

messages (For Chat Models)

Type: array
Required: Yes (unless you’re using text completion)
This is your conversation history. The AI reads all previous messages to understand context.
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help you?"},
    {"role": "user", "content": "What's the weather like?"}
  ]
}

Message Roles Explained

  • system: Sets the AI’s personality and behavior (like “You’re a pirate” or “Be concise”)
  • user: What the human said
  • assistant: What the AI responded
  • tool: Results from function calls (for advanced use cases)

prompt (For Text Completion)

Type: string
Required: Yes (for old-school completion models)
Just raw text for the model to continue.
{
  "prompt": "The future of artificial intelligence is"
}

model

Type: string
Required: Optional (we’ll use your account default)
Which AI brain you want to use.
{
  "model": "openai/gpt-4o"
}

The Core Controls

max_tokens

Type: integer
Range: [1, context_length)
Default: Depends on the model
How many tokens (roughly words) the AI can generate. Think of it as setting a word limit.
{
  "max_tokens": 1000
}
Pro tip: Don’t set this too high or you might get cut-off responses when the model hits other limits first.

temperature

Type: number
Range: [0, 2]
Default: 1.0
The creativity dial. This is probably the most important parameter you’ll use.
{
  "temperature": 0.7
}
  • 0.0: Robot mode - very predictable and focused
  • 1.0: Balanced - creative but still coherent
  • 2.0: Chaos mode - wild and unpredictable

stream

Type: boolean
Default: false
Get results as they’re generated instead of waiting for the full response.
{
  "stream": true
}
Perfect for chat interfaces where you want that “typing” effect.

Advanced Creativity Controls

top_p

Type: number
Range: (0, 1]
Default: 1.0
Alternative to temperature. Controls randomness by only considering the top % of probable next tokens.
{
  "top_p": 0.9
}
Note: Pick either temperature OR top_p, not both. They don’t play well together.

top_k

Type: integer
Range: [1, ∞)
Default: Model-specific
Limits how many word choices the model considers at each step.
{
  "top_k": 40
}
Heads up: OpenAI models don’t support this one.

frequency_penalty

Type: number
Range: [-2, 2]
Default: 0
Reduces word repetition. Positive values make the model avoid repeating words it’s already used.
{
  "frequency_penalty": 0.5
}

presence_penalty

Type: number
Range: [-2, 2]
Default: 0
Encourages talking about new topics. Positive values push the model to explore different subjects.
{
  "presence_penalty": 0.6
}

repetition_penalty

Type: number
Range: (0, 2]
Default: 1.0
Another way to fight repetition. Values > 1.0 discourage repeating, < 1.0 encourage it.
{
  "repetition_penalty": 1.1
}
Note: This is basically frequency_penalty’s cousin, used by different model families.

Output Control

stop

Type: string | array
Default: null
Tell the model when to shut up. Generation stops when it hits these sequences.
{
  "stop": ["\n", "END", "###"]
}
Super useful for controlling output format.

seed

Type: integer
Default: Random
Make the model deterministic. Same seed + same parameters = similar outputs.
{
  "seed": 42
}
Reality check: “Similar” not “identical” - there are no guarantees across model versions.

Structured Output

response_format

Type: object
Default: null
Force the model to output in specific formats.
{
  "response_format": {
    "type": "json_object"
  }
}
Available formats:
  • {"type": "json_object"}: Forces valid JSON
  • {"type": "text"}: Regular text (default)
Pro tip: Include format instructions in your prompt too. The model works better with examples.

Function/Tool Calling

tools

Type: array
Default: null
Give the model superpowers by defining functions it can call.
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

tool_choice

Type: string | object
Default: "auto"
Control how the model uses tools.
{
  "tool_choice": "auto"
}
Options:
  • "auto": Model decides when to use tools
  • "none": No tools allowed
  • {"type": "function", "function": {"name": "function_name"}}: Force a specific tool

AnyAPI Superpowers

transforms

Type: array
Default: []
Apply smart transformations to your messages before sending them to models.
{
  "transforms": ["middle-out"]
}
Available transforms:
  • "middle-out": Rearranges messages for better context utilization (especially useful for long conversations)

models

Type: array
Default: null
Specify fallback models in order of preference.
{
  "models": [
    "openai/gpt-4o",           # Try this first
    "anthropic/claude-3-sonnet", # Fall back to this
    "google/gemini-pro"        # Last resort
  ]
}

provider

Type: object
Default: null
Fine-tune provider selection and behavior.
{
  "provider": {
    "order": ["openai", "anthropic"],  # Preferred provider order
    "allow_fallbacks": true,           # Enable automatic fallbacks
    "data_collection": "deny"          # Opt out of provider data collection
  }
}

Model-Specific Quirks

Different AI families have different personalities:

OpenAI Models (GPT-4, GPT-3.5)

  • Love frequency_penalty and presence_penalty
  • Don’t understand top_k or repetition_penalty
  • Great at response_format for structured output

Anthropic Models (Claude)

  • Support top_k parameter
  • Prefer repetition_penalty over frequency penalties
  • Handle system messages a bit differently

Google Models (Gemini)

  • Support top_k parameter
  • Have different token limits
  • Can handle images and other media

Open Source Models (Llama, Mistral, etc.)

  • Usually support repetition_penalty
  • May have unique sampling parameters
  • Context lengths vary wildly

Real-World Parameter Recipes

Creative Writing

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Write a short story about a time-traveling chef"}],
  "temperature": 1.2,        # High creativity
  "max_tokens": 2000,        # Room for a full story
  "presence_penalty": 0.3    # Encourage topic variety
}

Code Generation

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Write a Python function to sort a list"}],
  "temperature": 0.1,        # Low creativity, high precision
  "max_tokens": 1000,
  "stop": ["```", "\n\n\n"]  # Stop at code block end or multiple newlines
}

Data Extraction

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Extract contact info from this text as JSON"}],
  "temperature": 0,          # Maximum precision
  "response_format": {"type": "json_object"},
  "max_tokens": 500
}

Conversational AI

{
  "model": "anthropic/claude-3-sonnet",
  "messages": [
    {"role": "system", "content": "You are a helpful but slightly sarcastic assistant"},
    {"role": "user", "content": "Help me debug this code"}
  ],
  "temperature": 0.7,        # Balanced personality
  "max_tokens": 1000,
  "stream": true            # Real-time responses
}

Parameter Validation (What Happens When You Mess Up)

Unknown Parameters

Models ignore parameters they don’t understand. This means you can use the same parameter set across different models without breaking anything.

Out-of-Range Values

We’ll fix obviously broken values:
  • temperature: 3.0 becomes temperature: 2.0
  • top_p: 1.5 becomes top_p: 1.0

Wrong Types

These will cause errors, so double-check:
  • max_tokens: "100" ❌ (should be a number)
  • stream: "true" ❌ (should be boolean true)

Pro Tips for Parameter Mastery

  1. Start simple: Begin with just temperature and max_tokens, add complexity later
  2. Temperature is king: This one parameter controls most of what you care about
  3. Test with your content: Parameters behave differently with different prompts
  4. Monitor token usage: Higher max_tokens = higher costs
  5. Use streaming wisely: Great for UX, but adds complexity to your code
  6. Don’t overthink penalties: Start with 0.0-0.5 range for frequency/presence penalties
  7. Seed for testing: Use consistent seeds when testing parameter changes
  8. Model-specific tuning: What works for GPT-4 might not work for Claude

Remember: Parameters are tools, not magic. The best parameter settings depend on your specific use case, content, and model. Start with sensible defaults and iterate based on your results.