Fine-tune your model’s behavior with these parameters, available in the Model Parameters panel in AnyChat or passed directly in your API request body.
Request Basics
messages
Type: array · Required: Yes (for chat models)
Your conversation history. The model reads all previous messages to understand context.
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help you?"},
{"role": "user", "content": "What's the weather like?"}
]
}
Message roles:
| Role | Purpose |
|---|
system | Sets the model’s personality and behavior |
user | The human’s input |
assistant | The model’s previous response |
tool | Results from function/tool calls |
prompt
Type: string · Required: Yes (for text completion models)
Raw text for the model to continue. Used with legacy completion endpoints.
{
"prompt": "The future of artificial intelligence is"
}
model
Type: string · Required: Optional (defaults to your account setting)
Which model to use for the request.
{
"model": "openai/gpt-4o"
}
Output Control
temperature
Default: 1 · Range: 0–2
Controls the randomness of responses. Lower values produce focused, deterministic output; higher values make responses more varied and creative.
- Use
0 for factual tasks, data extraction, structured output
- Use
0.7–1.0 for general-purpose tasks
- Use
1.2+ for creative writing, brainstorming
top_p
Default: 1 · Range: 0–1
Nucleus sampling — limits the model’s token selection to the top cumulative probability mass. 0.9 means only tokens covering 90% of the probability distribution are considered.
Avoid changing both temperature and top_p at the same time. Adjust one or the other.
top_k
Default: Model-specific · Range: [1, ∞)
Limits how many top tokens the model considers at each step. Supported by Anthropic, Google, and most open-source models.
OpenAI models do not support top_k.
Default: 1
Number of response variants to generate per request. Useful for comparing alternatives.
Cost scales linearly with n. Setting n: 3 triples the token cost.
stream
Default: true
Enables streaming — tokens are returned as they are generated rather than waiting for the full response.
true — lower perceived latency, better UX for chat interfaces
false — full response returned as a single object
stream_options
Default: null
Additional streaming configuration. Primarily used to include token usage statistics in the final chunk.
{ "stream_options": { "include_usage": true } }
Length Control
max_tokens
Default: null
Maximum number of tokens in the response. When null, uses the model’s default limit. Setting this explicitly helps control costs and prevent runaway outputs.
max_completion_tokens
Default: null
OpenAI o-series equivalent of max_tokens. Covers both reasoning tokens and visible output tokens. Use this instead of max_tokens for o1, o3, and similar reasoning models.
{ "max_completion_tokens": 4096 }
stop
Default: null
Stop sequences — the model halts generation when it produces this string or any string in the array.
{ "stop": ["###", "\n\n"] }
Repetition Penalties
frequency_penalty
Default: 0 · Range: -2 to 2
Penalizes tokens proportionally to how often they’ve already appeared. Positive values reduce repetition of the same words and phrases. Useful for long-form content generation.
{ "frequency_penalty": 0.5 }
presence_penalty
Default: 0 · Range: -2 to 2
Penalizes tokens that have appeared at all, regardless of frequency. Encourages the model to introduce new topics. Useful for open-ended creative tasks.
{ "presence_penalty": 0.4 }
repetition_penalty
Default: 1.0 · Range: (0, 2]
Alternative repetition control used by many open-source models (Llama, Mistral, etc.). Values above 1.0 discourage repetition, below 1.0 encourage it.
This serves a similar purpose to frequency_penalty but is used by different model families. OpenAI models do not support this parameter.
{ "repetition_penalty": 1.1 }
Reproducibility
seed
Default: null
When set, makes responses deterministic — identical inputs with the same seed produce the same output (or very close to it). Ideal for testing, A/B comparisons, and debugging prompts.
Default: null
Array of tool definitions the model can call. Each tool specifies a name, description, and parameters schema.
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
},
"required": ["location"]
}
}
}
]
}
Default: "auto"
Controls whether and how the model calls tools.
| Value | Behavior |
|---|
"auto" | Model decides whether to call a tool |
"required" | Model must call at least one tool |
"none" | Model never calls tools |
{"type": "function", "function": {"name": "..."}} | Forces a specific tool |
Default: true
Allows the model to call multiple tools simultaneously in a single response. Disable if your tools have sequential dependencies or shared state.
{ "parallel_tool_calls": false }
function_call / functions
Deprecated
Legacy OpenAI function calling format. Use tools and tool_choice instead.
Token Probabilities
logprobs
Default: false
Returns the log-probabilities of output tokens. Useful for confidence analysis, classification tasks, and model calibration.
top_logprobs
Default: null · Range: 1–20
Number of alternative tokens (with their probabilities) to return at each position. Requires logprobs: true.
{ "logprobs": true, "top_logprobs": 5 }
logit_bias
Default: null
Map of {token_id: bias} to increase or decrease the likelihood of specific tokens. Range: -100 (ban) to 100 (force). Rarely needed, but useful for restricting vocabulary or enforcing specific output formats.
{ "logit_bias": { "50256": -100 } }
Multimodality
modalities
Default: null
Output types to request. Use ["text", "audio"] for models that support audio output (e.g., gpt-4o-audio).
{ "modalities": ["text", "audio"] }
audio
Default: null
Audio output configuration. Works in conjunction with modalities: ["text", "audio"].
{
"audio": {
"voice": "alloy",
"format": "mp3"
}
}
prediction
Default: null
Predicted outputs — provide the expected response in advance. The model confirms or corrects it, which can reduce latency and cost. Most effective for editing tasks where much of the text remains unchanged.
{
"prediction": {
"type": "content",
"content": "The corrected version of the text..."
}
}
Reasoning
reasoning_effort
Default: "medium"
Controls the depth of reasoning for models that support chain-of-thought (e.g., o1, o3, Claude with extended thinking). Higher effort = more thinking tokens = better quality at higher cost.
| Value | Use when |
|---|
"low" | Speed and cost are the priority |
"medium" | Balanced performance (recommended default) |
"high" | Maximum accuracy for complex tasks |
{ "reasoning_effort": "high" }
thinking
Default: null
Native Anthropic parameter for Claude models with extended thinking. Enables a reasoning phase with a configurable token budget.
{
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
Prompt Caching
prompt_cache_key
Default: null
Explicit key for managing prompt cache. Requests sharing the same key and prefix reuse the cached computation, reducing cost on repeated calls with identical system prompts or context.
{ "prompt_cache_key": "my-system-prompt-v1" }
prompt_cache_retention
Default: null
TTL (in seconds) for how long the cached prompt prefix is retained. Useful for controlling cache lifecycle on high-frequency workloads.
{ "prompt_cache_retention": 3600 }
Default: null
Forces structured output. Use json_object for any valid JSON, or json_schema to enforce a specific schema.
{ "response_format": { "type": "json_object" } }
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "my_schema",
"schema": {
"type": "object",
"properties": {
"answer": { "type": "string" }
},
"required": ["answer"]
}
}
}
}
store
Default: null
Whether to save the request for later use (fine-tuning, evals) on the provider’s side. Supported by OpenAI.
Default: null
Arbitrary tags attached to the request. Useful for filtering in logs and analytics dashboards.
{
"metadata": {
"tags": ["chat_completions:reasoning"]
}
}
Infrastructure & Reliability
service_tier
Default: null
Request processing priority at the provider level. OpenAI supports "flex" for cheaper async processing with relaxed latency requirements.
{ "service_tier": "flex" }
max_retries
Default: null
Number of automatic retries on failure. Useful when working with less stable providers or during high-traffic periods.
Default: null
Additional HTTP headers passed to the provider. Used for provider-specific features not covered by standard parameters.
{
"extra_headers": {
"X-Custom-Header": "value"
}
}
safety_identifier
Default: null
Identifier for provider-side safety systems. Relevant for specific enterprise integrations that require request tagging for moderation pipelines.
web_search_options
Default: null
Configuration for built-in web search. Available on 800+ models across all major providers.
{ "web_search_options": { "search_context_size": "medium" } }
AnyAPI-Specific Parameters
Default: []
Apply smart transformations to your messages before sending them to models.
{ "transforms": ["middle-out"] }
Available transforms:
"middle-out" — rearranges messages for better context utilization, especially useful for long conversations
models
Default: null
Specify fallback models in order of preference. If the first model fails or is unavailable, the next one is tried.
{
"models": [
"openai/gpt-4o",
"anthropic/claude-sonnet-4-6",
"google/gemini-2.5-pro"
]
}
provider
Default: null
Fine-tune provider selection and behavior.
{
"provider": {
"order": ["openai", "anthropic"],
"allow_fallbacks": true,
"data_collection": "deny"
}
}
| Field | Description |
|---|
order | Preferred provider order |
allow_fallbacks | Enable automatic fallbacks |
data_collection | Opt out of provider data collection ("deny") |
Model-Specific Behavior
Different model families support different parameter sets:
| Parameter | OpenAI | Anthropic | Google | Open Source |
|---|
temperature | Yes | Yes | Yes | Yes |
top_p | Yes | Yes | Yes | Yes |
top_k | No | Yes | Yes | Most |
frequency_penalty | Yes | No | Yes | Some |
presence_penalty | Yes | No | Yes | Some |
repetition_penalty | No | Yes | No | Most |
response_format | Yes | Yes | Yes | Varies |
Models silently ignore parameters they don’t support — you can safely use the same parameter set across different models.
Parameter Recipes
Creative Writing
{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Write a short story about a time-traveling chef"}],
"temperature": 1.2,
"max_tokens": 2000,
"presence_penalty": 0.3
}
Code Generation
{
"model": "anthropic/claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Write a Python function to sort a list"}],
"temperature": 0.1,
"max_tokens": 1000
}
{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "Extract contact info from this text as JSON"}],
"temperature": 0,
"response_format": {"type": "json_object"},
"max_tokens": 500
}
Conversational AI
{
"model": "anthropic/claude-sonnet-4-6",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Help me debug this code"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": true
}
Parameter Validation
Unknown parameters
Models ignore parameters they don’t understand. You can use the same parameter set across different models without errors.
Out-of-range values
AnyAPI clamps obviously invalid values:
temperature: 3.0 → temperature: 2.0
top_p: 1.5 → top_p: 1.0
Wrong types
Type mismatches cause errors:
max_tokens: "100" — should be a number
stream: "true" — should be boolean true