Authorizations
Bearer token authentication. Get your API key from the dashboard.
Body
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
ID of the model to use
"gpt-4o-transcribe"
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency
An optional text to guide the model's style or continue a previous audio segment
The format of the transcript output
json
, text
, srt
, verbose_json
, vtt
The sampling temperature, between 0 and 1
0 <= x <= 1
The timestamp granularities to populate for this transcription
If set, partial transcription results will be sent as server-sent events
Additional data to include in the response
Response
Successful response
The transcribed text
The language of the input audio
The duration of the input audio in seconds
Extracted words and their corresponding timestamps (when timestamp_granularities includes 'word')
Segments of the transcribed text and their corresponding details (when timestamp_granularities includes 'segment')