For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
      • Cohere Transcribe
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • About the Cohere Transcribe model
  • Model details
  • Availability
  • Strengths
  • Limitations
  • Model architecture
  • Further Resources
ModelsAudio

Cohere Transcribe

Was this page helpful?
Edit this page
Previous

Aya Family of Models

Next
Built with

About the Cohere Transcribe model

Cohere Transcribe is an open source research release of a 2B parameters dedicated audio-in, text-out, automatic speech recognition (ASR) model. The model supports a total of 14 languages.

Model details

  • Input: Audio waveform
  • Output: Text
  • Model name: cohere-transcribe-03-2026
  • Languages covered: English, German, French, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Vietnamese, Chinese, Arabic, Japanese, Korean.
  • Maximum file size: 25MB
  • License: Apache 2.0

Availability

You can access Cohere Transcribe via our API for free, low-setup experimentation subject to rate limits.

For production deployment without rate limits, provision a dedicated Model Vault. This enables low-latency, private cloud inference without having to manage infrastructure. Pricing is calculated per hour-instance, with discounted plans for longer-term commitments. Contact our team to discuss your requirements.

Strengths

Cohere Transcribe demonstrates best-in-class transcription accuracy on 14 languages. As a dedicated speech recognition model, it is also efficient, benefitting from a real-time factor up to three times faster than that of other, dedicated ASR models in the same size range. The model was trained from scratch, and from the outset, we deliberately focused on minimizing word error rate (WER) while keeping production readiness top-of-mind.

Limitations

  • Single language: The model performs best when remaining in-distribution of a single, pre-specified language amongst the 14 in the range it supports. It does not feature explicit, automatic language detection.

  • Timestamps/speaker diarization: The model does not feature either of these.

Model architecture

Cohere Transcribe is built on a speech-optimized Transformer variant: a Conformer. Input audio waveforms are converted into a Mel spectrogram and then processed by a Conformer encoder that holds the majority of the model’s parameters. The encoder’s representations are then passed to a lightweight Transformer decoder that generates text tokens. Cohere Transcribe is trained using standard supervised cross-entropy.

Further Resources

  • Audio Transcriptions quickstart
  • Audio Transcriptions API reference documentation
  • Jupyter notebook