Cohere Documentation

The most common way to access Cohere’s large language models (LLMs) is through the Cohere platform, which is fully managed by Cohere and accessible through an API.

But that’s not the only way to access Cohere’s models. In an enterprise setting, organizations might require more control over where and how the models are hosted.

Specifically, Cohere offers four types of deployment options.

Cohere Platform
Cloud AI Services
Private Deployments - Cloud
Private Deployments - On-Premises

Cohere platform

This is the fastest and easiest way to start using Cohere’s models. The models are hosted on Cohere infrastructure and available on our public SaaS platform (which provides an API data opt-out), which is fully managed by Cohere.

Cloud AI services

These managed services enable enterprises to access Cohere’s models on cloud AI services. In this scenario, Cohere’s models are hosted on the cloud provider’s infrastructure. Cohere is cloud-agnostic, meaning you can deploy our models through any cloud provider.

AWS

Developers can access a range of Cohere’s language models in a private environment via Amazon’s AWS Cloud platform. Cohere’s models are supported on two Amazon services: Amazon Bedrock and Amazon SageMaker.

Amazon Bedrock

Amazon Bedrock is a fully managed service where foundational models from Cohere are made available through a single, serverless API. Read about Bedrock here.

View Cohere’s models on Amazon Bedrock.

Amazon SageMaker

Amazon SageMaker is a service that allows customers to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. Read about SageMaker here.

Cohere offers a comprehensive suite of generative and embedding models through SageMaker on a range of hardware options, many of which support finetuning for deeper customization and performance.

View Cohere’s model listing on the AWS Marketplace.

Azure AI Foundry

Azure AI Foundry is a platform that is designed for developers to build generative AI applications on an enterprise-grade platform. Developers can explore a wide range of models, services, and capabilities to build AI applications that meet their specific goals.

View Cohere’s models on Azure AI Foundry.

OCI Generative AI Service

Oracle Cloud Infrastructure Generative AI is a fully managed service that enables you to use Cohere’s generative and embedding models through an API.

Private deployments

Cloud (VPC)

Private deployments (cloud) allow enterprises to deploy the Cohere stack privately on cloud platforms. With AWS, Cohere’s models can be deployed in an enterprise’s AWS Cloud environment via their own VPC (EC2, EKS). Compared to managed cloud services, VPC deployments provide tighter control and compliance. No egress is another common reason for going with VPCs. Overall, the VPC option has a higher management burden but offers more flexibility.

On-premises

Private deployments on-premises (on-prem) allow enterprises to deploy the Cohere stack privately on their own compute. This includes air-gapped environments where systems are physically isolated from unsecured networks, providing maximum security for sensitive workloads.