Transcribe audio to text

POST

audio

transcriptions

curl --request POST \
  --url https://api.anyapi.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@example-file \
  --form model=MODEL_EXAMPLE

{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ]
}

Authorizations

Authorization

string

header

required

Bearer token authentication. Get your API key from the dashboard.

Body

multipart/form-data

file

required

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

model

string

required

ID of the model to use

Example:

"gpt-4o-transcribe"

language

string

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency

prompt

string

An optional text to guide the model's style or continue a previous audio segment

response_format

enum<string>

default:json

The format of the transcript output

Available options:

json,

text,

srt,

verbose_json,

vtt

temperature

number

default:0

The sampling temperature, between 0 and 1

Required range: 0 <= x <= 1

timestamp_granularities

enum<string>[]

The timestamp granularities to populate for this transcription

Show child attributes

stream

boolean

default:false

If set, partial transcription results will be sent as server-sent events

include

enum<string>[]

Additional data to include in the response

Show child attributes

Response

Successful response

text

string

required

The transcribed text

language

string

The language of the input audio

duration

number

The duration of the input audio in seconds

words

object[]

Extracted words and their corresponding timestamps (when timestamp_granularities includes 'word')

Show child attributes

segments

object[]

Segments of the transcribed text and their corresponding details (when timestamp_granularities includes 'segment')

Show child attributes

Translate audio to English textTranslates audio into English text

⌘I

curl --request POST \
  --url https://api.anyapi.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@example-file \
  --form model=MODEL_EXAMPLE

{
  "text": "<string>",
  "language": "<string>",
  "duration": 123,
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ],
  "segments": [
    {
      "id": 123,
      "seek": 123,
      "start": 123,
      "end": 123,
      "text": "<string>",
      "tokens": [
        123
      ],
      "temperature": 123,
      "avg_logprob": 123,
      "compression_ratio": 123,
      "no_speech_prob": 123
    }
  ]
}

Get started

Features

Use Cases

Developer guides

API Reference

Integrations

Authorizations

Body

Response