Skip to main content
POST
/
audio
/
transcriptions
curl --request POST \
--url https://api.anyapi.ai/v1/audio/transcriptions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: multipart/form-data' \
--form file=@example-file \
--form model=MODEL_EXAMPLE
{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer token authentication. Get your API key from the dashboard.

Body

multipart/form-data
file
file
required

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

model
string
required

ID of the model to use

Example:

"gpt-4o-transcribe"

language
string

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency

prompt
string

An optional text to guide the model's style or continue a previous audio segment

response_format
enum<string>
default:json

The format of the transcript output

Available options:
json,
text,
srt,
verbose_json,
vtt
temperature
number
default:0

The sampling temperature, between 0 and 1

Required range: 0 <= x <= 1
timestamp_granularities
enum<string>[]

The timestamp granularities to populate for this transcription

stream
boolean
default:false

If set, partial transcription results will be sent as server-sent events

include
enum<string>[]

Additional data to include in the response

Response

Successful response

text
string
required

The transcribed text

language
string

The language of the input audio

duration
number

The duration of the input audio in seconds

words
object[]

Extracted words and their corresponding timestamps (when timestamp_granularities includes 'word')

segments
object[]

Segments of the transcribed text and their corresponding details (when timestamp_granularities includes 'segment')

I