Audio Transcription - quickstart

The Audio Transcriptions API provides a dedicated speech-to-text endpoint for uploaded audio files. It features:

  • Performant audio transcription with low word error rate (WER)
  • A simple multipart upload flow
  • Support for 14 languages
1

Prerequisites

2

Setting up the audio file

The Audio Transcriptions endpoint accepts FLAC, MP3, MPEG, MPGA, OGG, and WAV files of 25MB or less. For this quickstart, we’ll use the following WAV example.

If you don’t have an audio file already, download the example to follow along.

3

Passing the audio file

Use the following command to pass the audio file to the Audio Transcription endpoint.

1curl -X POST "https://api.cohere.com/v2/audio/transcriptions" \
2 -H "Authorization: Bearer $TRIAL_KEY" \
3 -F "model=cohere-transcribe-03-2026" \
4 -F "language=en" \
5 -F "file=@./transcribe-model-sample-derrida-mashup.wav" \
4

Receiving the response

It should return a response similar to the following.

${"text":"I speak only one language, and it's not my own, but the poet is a man of metaphor."}

Further Resources