Private Deployment Overview (v1 API)

What is a Private Deployment?

Private deployments allow organizations to implement and run AI models within a controlled, internal environment.

In a private deployment, you manage the model deployment infrastructure (with Cohere’s guidance and support). This includes ensuring hardware and driver compatibility as well as installing prerequisites to run the containers. These deployments typically run on Kubernetes, but it’s not a firm requirement.

Cohere supports two types of private deployments:

On-premises (on-prem)
Gives you full control over both your data and the AI system on your own premises with your own hardware. You procure your own GPUs, servers and other hardware to insulate your environment from external threats.
On the cloud, typically a virtual private cloud (VPC)
You use infrastructure needed to host AI models from a cloud provider (such as AWS, Azure, GCP, or OCI) while still retaining control of how the data is stored and processed. Cohere can support any VPC on any cloud environment, so long as the necessary hardware requirements are met.

Why Private Deployment?

With private deployments, you maintain full control over your infrastructure while leveraging Cohere’s state-of-the-art language models.

This enables you to deploy LLMs within your secure network, whether through your chosen cloud provider or your own environment. The data never leaves your environment, and the model can be fully network-isolated.

Here are some of the benefits of private deployments:

Data security: On-prem deployments allow you to keep your data secure and compliant with data protection regulations. A VPC offers similar yet somewhat less rigorous protection.
Model customization: Fine-tuning in a private environment allows enteprises to maintain strict control over their data, avoiding the risk of sensitive or proprietary data leaking.
Infrastructure needs: Public cloud is fast and easily scalable in general. But when the necessary hardware is not available in a specific region, on-prem can provide a faster solution.

Private Deployment Components

Cohere’s platform container consists of several key components:

Endpoints: API endpoints for model interaction
Models: AI model management and storage
Serving Framework: Manages model serving and request handling
Fine-tuning Framework: Handles model fine-tuning