For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
      • Overview
      • Setting Up
      • Model Deployment
      • Model Deployment - AWS
      • Usage
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • What is a Private Deployment?
  • Why Private Deployment?
  • Private Deployment Components
Deployment OptionsPrivate Deployment

Private Deployment Overview

Was this page helpful?
Edit this page
Previous

Private Deployment – Setting Up

Next
Built with

What is a Private Deployment?

Private deployments allow organizations to implement and run AI models within a controlled, internal environment.

In a private deployment, you manage the model deployment infrastructure (with Cohere’s guidance and support). This includes ensuring hardware and driver compatibility as well as installing prerequisites to run the containers. These deployments typically run on Kubernetes, but it’s not a firm requirement.

Cohere supports two types of private deployments:

  • On-premises (on-prem)
    Gives you full control over both your data and the AI system on your own premises with your own hardware. You procure your own GPUs, servers and other hardware to insulate your environment from external threats.

  • On the cloud, typically a virtual private cloud (VPC)
    You use infrastructure needed to host AI models from a cloud provider (such as AWS, Azure, GCP, or OCI) while still retaining control of how the data is stored and processed. Cohere can support any VPC on any cloud environment, so long as the necessary hardware requirements are met.

Why Private Deployment?

With private deployments, you maintain full control over your infrastructure while leveraging Cohere’s state-of-the-art language models.

This enables you to deploy LLMs within your secure network, whether through your chosen cloud provider or your own environment. The data never leaves your environment, and the model can be fully network-isolated.

Here are some of the benefits of private deployments:

  • Data security: On-prem deployments allow you to keep your data secure and compliant with data protection regulations. A VPC offers similar yet somewhat less rigorous protection.
  • Model customization: Fine-tuning in a private environment allows enteprises to maintain strict control over their data, avoiding the risk of sensitive or proprietary data leaking.
  • Infrastructure needs: Public cloud is fast and easily scalable in general. But when the necessary hardware is not available in a specific region, on-prem can provide a faster solution.

Private Deployment Components

Cohere’s platform container consists of several key components:

  • Endpoints: API endpoints for model interaction
  • Models: AI model management and storage
  • Serving Framework: Manages model serving and request handling
  • Fine-tuning Framework: Handles model fine-tuning