For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Cohere platform
  • Cloud AI services
  • AWS
  • Amazon Bedrock
  • Amazon SageMaker
  • Azure AI Foundry
  • OCI Generative AI Service
  • Private deployments
  • Cloud (VPC)
  • On-premises
Deployment Options

Deployment Options - Overview

Was this page helpful?
Edit this page
Previous

Cohere SDK Cloud Platform Compatibility

Next
Built with

The most common way to access Cohere’s large language models (LLMs) is through the Cohere platform, which is fully managed by Cohere and accessible through an API.

But that’s not the only way to access Cohere’s models. In an enterprise setting, organizations might require more control over where and how the models are hosted.

Specifically, Cohere offers four types of deployment options.

  1. Cohere Platform
  2. Cloud AI Services
  3. Private Deployments - Cloud
  4. Private Deployments - On-Premises

Cohere platform

This is the fastest and easiest way to start using Cohere’s models. The models are hosted on Cohere infrastructure and available on our public SaaS platform (which provides an API data opt-out), which is fully managed by Cohere.

Cloud AI services

These managed services enable enterprises to access Cohere’s models on cloud AI services. In this scenario, Cohere’s models are hosted on the cloud provider’s infrastructure. Cohere is cloud-agnostic, meaning you can deploy our models through any cloud provider.

AWS

Developers can access a range of Cohere’s language models in a private environment via Amazon’s AWS Cloud platform. Cohere’s models are supported on two Amazon services: Amazon Bedrock and Amazon SageMaker.

Amazon Bedrock

Amazon Bedrock is a fully managed service where foundational models from Cohere are made available through a single, serverless API. Read about Bedrock here.

View Cohere’s models on Amazon Bedrock.

Amazon SageMaker

Amazon SageMaker is a service that allows customers to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. Read about SageMaker here.

Cohere offers a comprehensive suite of generative and embedding models through SageMaker on a range of hardware options, many of which support finetuning for deeper customization and performance.

View Cohere’s model listing on the AWS Marketplace.

Azure AI Foundry

Azure AI Foundry is a platform that is designed for developers to build generative AI applications on an enterprise-grade platform. Developers can explore a wide range of models, services, and capabilities to build AI applications that meet their specific goals.

View Cohere’s models on Azure AI Foundry.

OCI Generative AI Service

Oracle Cloud Infrastructure Generative AI is a fully managed service that enables you to use Cohere’s generative and embedding models through an API.

Private deployments

Cloud (VPC)

Private deployments (cloud) allow enterprises to deploy the Cohere stack privately on cloud platforms. With AWS, Cohere’s models can be deployed in an enterprise’s AWS Cloud environment via their own VPC (EC2, EKS). Compared to managed cloud services, VPC deployments provide tighter control and compliance. No egress is another common reason for going with VPCs. Overall, the VPC option has a higher management burden but offers more flexibility.

On-premises

Private deployments on-premises (on-prem) allow enterprises to deploy the Cohere stack privately on their own compute. This includes air-gapped environments where systems are physically isolated from unsecured networks, providing maximum security for sensitive workloads.