Skip to main content
Fine-tune your model’s behavior with these parameters, available in the Model Parameters panel in AnyChat or passed directly in your API request body.

Request Basics

messages

Type: array · Required: Yes (for chat models) Your conversation history. The model reads all previous messages to understand context.
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help you?"},
    {"role": "user", "content": "What's the weather like?"}
  ]
}
Message roles:
RolePurpose
systemSets the model’s personality and behavior
userThe human’s input
assistantThe model’s previous response
toolResults from function/tool calls

prompt

Type: string · Required: Yes (for text completion models) Raw text for the model to continue. Used with legacy completion endpoints.
{
  "prompt": "The future of artificial intelligence is"
}

model

Type: string · Required: Optional (defaults to your account setting) Which model to use for the request.
{
  "model": "openai/gpt-4o"
}

Output Control

temperature

Default: 1 · Range: 0–2 Controls the randomness of responses. Lower values produce focused, deterministic output; higher values make responses more varied and creative.
  • Use 0 for factual tasks, data extraction, structured output
  • Use 0.7–1.0 for general-purpose tasks
  • Use 1.2+ for creative writing, brainstorming
{ "temperature": 0.7 }

top_p

Default: 1 · Range: 0–1 Nucleus sampling — limits the model’s token selection to the top cumulative probability mass. 0.9 means only tokens covering 90% of the probability distribution are considered.
Avoid changing both temperature and top_p at the same time. Adjust one or the other.
{ "top_p": 0.9 }

top_k

Default: Model-specific · Range: [1, ∞) Limits how many top tokens the model considers at each step. Supported by Anthropic, Google, and most open-source models.
OpenAI models do not support top_k.
{ "top_k": 40 }

n

Default: 1 Number of response variants to generate per request. Useful for comparing alternatives.
Cost scales linearly with n. Setting n: 3 triples the token cost.
{ "n": 3 }

stream

Default: true Enables streaming — tokens are returned as they are generated rather than waiting for the full response.
  • true — lower perceived latency, better UX for chat interfaces
  • false — full response returned as a single object
{ "stream": true }

stream_options

Default: null Additional streaming configuration. Primarily used to include token usage statistics in the final chunk.
{ "stream_options": { "include_usage": true } }

Length Control

max_tokens

Default: null Maximum number of tokens in the response. When null, uses the model’s default limit. Setting this explicitly helps control costs and prevent runaway outputs.
{ "max_tokens": 2048 }

max_completion_tokens

Default: null OpenAI o-series equivalent of max_tokens. Covers both reasoning tokens and visible output tokens. Use this instead of max_tokens for o1, o3, and similar reasoning models.
{ "max_completion_tokens": 4096 }

stop

Default: null Stop sequences — the model halts generation when it produces this string or any string in the array.
{ "stop": ["###", "\n\n"] }

Repetition Penalties

frequency_penalty

Default: 0 · Range: -2 to 2 Penalizes tokens proportionally to how often they’ve already appeared. Positive values reduce repetition of the same words and phrases. Useful for long-form content generation.
{ "frequency_penalty": 0.5 }

presence_penalty

Default: 0 · Range: -2 to 2 Penalizes tokens that have appeared at all, regardless of frequency. Encourages the model to introduce new topics. Useful for open-ended creative tasks.
{ "presence_penalty": 0.4 }

repetition_penalty

Default: 1.0 · Range: (0, 2] Alternative repetition control used by many open-source models (Llama, Mistral, etc.). Values above 1.0 discourage repetition, below 1.0 encourage it.
This serves a similar purpose to frequency_penalty but is used by different model families. OpenAI models do not support this parameter.
{ "repetition_penalty": 1.1 }

Reproducibility

seed

Default: null When set, makes responses deterministic — identical inputs with the same seed produce the same output (or very close to it). Ideal for testing, A/B comparisons, and debugging prompts.
{ "seed": 42 }

Tools & Function Calling

tools

Default: null Array of tool definitions the model can call. Each tool specifies a name, description, and parameters schema.
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

tool_choice

Default: "auto" Controls whether and how the model calls tools.
ValueBehavior
"auto"Model decides whether to call a tool
"required"Model must call at least one tool
"none"Model never calls tools
{"type": "function", "function": {"name": "..."}}Forces a specific tool

parallel_tool_calls

Default: true Allows the model to call multiple tools simultaneously in a single response. Disable if your tools have sequential dependencies or shared state.
{ "parallel_tool_calls": false }

function_call / functions

Deprecated Legacy OpenAI function calling format. Use tools and tool_choice instead.

Token Probabilities

logprobs

Default: false Returns the log-probabilities of output tokens. Useful for confidence analysis, classification tasks, and model calibration.
{ "logprobs": true }

top_logprobs

Default: null · Range: 1–20 Number of alternative tokens (with their probabilities) to return at each position. Requires logprobs: true.
{ "logprobs": true, "top_logprobs": 5 }

logit_bias

Default: null Map of {token_id: bias} to increase or decrease the likelihood of specific tokens. Range: -100 (ban) to 100 (force). Rarely needed, but useful for restricting vocabulary or enforcing specific output formats.
{ "logit_bias": { "50256": -100 } }

Multimodality

modalities

Default: null Output types to request. Use ["text", "audio"] for models that support audio output (e.g., gpt-4o-audio).
{ "modalities": ["text", "audio"] }

audio

Default: null Audio output configuration. Works in conjunction with modalities: ["text", "audio"].
{
  "audio": {
    "voice": "alloy",
    "format": "mp3"
  }
}

prediction

Default: null Predicted outputs — provide the expected response in advance. The model confirms or corrects it, which can reduce latency and cost. Most effective for editing tasks where much of the text remains unchanged.
{
  "prediction": {
    "type": "content",
    "content": "The corrected version of the text..."
  }
}

Reasoning

reasoning_effort

Default: "medium" Controls the depth of reasoning for models that support chain-of-thought (e.g., o1, o3, Claude with extended thinking). Higher effort = more thinking tokens = better quality at higher cost.
ValueUse when
"low"Speed and cost are the priority
"medium"Balanced performance (recommended default)
"high"Maximum accuracy for complex tasks
{ "reasoning_effort": "high" }

thinking

Default: null Native Anthropic parameter for Claude models with extended thinking. Enables a reasoning phase with a configurable token budget.
{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  }
}

Prompt Caching

prompt_cache_key

Default: null Explicit key for managing prompt cache. Requests sharing the same key and prefix reuse the cached computation, reducing cost on repeated calls with identical system prompts or context.
{ "prompt_cache_key": "my-system-prompt-v1" }

prompt_cache_retention

Default: null TTL (in seconds) for how long the cached prompt prefix is retained. Useful for controlling cache lifecycle on high-frequency workloads.
{ "prompt_cache_retention": 3600 }

Response Format

response_format

Default: null Forces structured output. Use json_object for any valid JSON, or json_schema to enforce a specific schema.
{ "response_format": { "type": "json_object" } }
{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "my_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" }
        },
        "required": ["answer"]
      }
    }
  }
}

Storage & Metadata

store

Default: null Whether to save the request for later use (fine-tuning, evals) on the provider’s side. Supported by OpenAI.
{ "store": true }

metadata

Default: null Arbitrary tags attached to the request. Useful for filtering in logs and analytics dashboards.
{
  "metadata": {
    "tags": ["chat_completions:reasoning"]
  }
}

Infrastructure & Reliability

service_tier

Default: null Request processing priority at the provider level. OpenAI supports "flex" for cheaper async processing with relaxed latency requirements.
{ "service_tier": "flex" }

max_retries

Default: null Number of automatic retries on failure. Useful when working with less stable providers or during high-traffic periods.
{ "max_retries": 3 }

extra_headers

Default: null Additional HTTP headers passed to the provider. Used for provider-specific features not covered by standard parameters.
{
  "extra_headers": {
    "X-Custom-Header": "value"
  }
}

safety_identifier

Default: null Identifier for provider-side safety systems. Relevant for specific enterprise integrations that require request tagging for moderation pipelines.

web_search_options

Default: null Configuration for built-in web search. Available on 800+ models across all major providers.
{ "web_search_options": { "search_context_size": "medium" } }

AnyAPI-Specific Parameters

transforms

Default: [] Apply smart transformations to your messages before sending them to models.
{ "transforms": ["middle-out"] }
Available transforms:
  • "middle-out" — rearranges messages for better context utilization, especially useful for long conversations

models

Default: null Specify fallback models in order of preference. If the first model fails or is unavailable, the next one is tried.
{
  "models": [
    "openai/gpt-4o",
    "anthropic/claude-sonnet-4-6",
    "google/gemini-2.5-pro"
  ]
}

provider

Default: null Fine-tune provider selection and behavior.
{
  "provider": {
    "order": ["openai", "anthropic"],
    "allow_fallbacks": true,
    "data_collection": "deny"
  }
}
FieldDescription
orderPreferred provider order
allow_fallbacksEnable automatic fallbacks
data_collectionOpt out of provider data collection ("deny")

Model-Specific Behavior

Different model families support different parameter sets:
ParameterOpenAIAnthropicGoogleOpen Source
temperatureYesYesYesYes
top_pYesYesYesYes
top_kNoYesYesMost
frequency_penaltyYesNoYesSome
presence_penaltyYesNoYesSome
repetition_penaltyNoYesNoMost
response_formatYesYesYesVaries
Models silently ignore parameters they don’t support — you can safely use the same parameter set across different models.

Parameter Recipes

Creative Writing

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Write a short story about a time-traveling chef"}],
  "temperature": 1.2,
  "max_tokens": 2000,
  "presence_penalty": 0.3
}

Code Generation

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{"role": "user", "content": "Write a Python function to sort a list"}],
  "temperature": 0.1,
  "max_tokens": 1000
}

Data Extraction

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Extract contact info from this text as JSON"}],
  "temperature": 0,
  "response_format": {"type": "json_object"},
  "max_tokens": 500
}

Conversational AI

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Help me debug this code"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": true
}

Parameter Validation

Unknown parameters

Models ignore parameters they don’t understand. You can use the same parameter set across different models without errors.

Out-of-range values

AnyAPI clamps obviously invalid values:
  • temperature: 3.0temperature: 2.0
  • top_p: 1.5top_p: 1.0

Wrong types

Type mismatches cause errors:
  • max_tokens: "100" — should be a number
  • stream: "true" — should be boolean true