Single Container on Private Clouds
This document walks through how to pull Cohere’s container images using a license, and provides steps for testing both Docker and Kubernetes images.
Before starting, ensure you have a license and image tag provided by Cohere.
Pull Container Images with A License
Cohere provides access to container images through a registry authenticated with a license. Users can pull these images and replicate them in their environment, as needed, to avoid runtime network access from inside the cluster.
Images will come through the proxy.replicated.com
registry. Pulling the images will require firewall access open to proxy.replicated.com
and proxy-auth.replicated.com
. More information on these endpoints may be found here.
To test pulling images with a license, modify your docker CLI configuration to include authentication details for the registry. Note: docker login
will not work.
The docker CLI is only an example; any tool which can pull images with credentials will work with the license ID configured as both username and password. Skopeo is another popular tool for copying images between registries which will work with this flow.
The following commands will overwrite your existing docker CLI configuration with authentication details for Cohere’s registry. If preferred, you can manually add the authentication details to preserve your existing configuration.
Validate that the authenticated image pull works correctly using the docker CLI:
You can now re-tag and replicate this image anywhere you want, using workflows appropriate to your air-gapped environment.
Validate Workload Infrastructure
Once you can pull the image from the registry, run a test workload to validate the container’s functionality.
Docker/Containerd
To test the container image with Docker, you should have a machine with the following installed:
- Nvidia drivers installed on host (the latest tested version is 545).
- nvidia-container-toolkit and corresponding configuration for docker/containerd.
Example Usage
Different models have different inputs.
- Embed models expect an array of texts and return the embeddings as output.
- Rerank models expect a list of documents and a query, returning relevance scores for the top
n
results (then
parameter is configurable). - Command models expect a prompt and return the model response.
This section provides simple examples of using each primary Cohere model in a Docker container. Note that if you try these out and get an error like curl: (7) Failed to connect to localhost port 8080: Connection refused
, the container has not yet fully started up. Wait a few more seconds and then try again.
Bash Commands for Running Cohere Models Through Docker
Here are the bash
commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Docker.
You’ll note that final example includes documents that the Command model can use to ground its replies. This functionality falls under retrieval augmented generation.
Kubernetes
Deploying to Kubernetes requires nodes with the following installed:
- Nvidia drivers - latest tested version is currently 545.
- nvidia-container-toolkit and corresponding configuration for docker/containerd.
- nvidia-device-plugin to make GPUs available to Kubernetes.
To deploy the same image on Kubernetes, we must first convert the docker configuration into an image pull secret (see the Kubernetes documentation for more detail).
With that done, fill in the environment variables and generate the application manifest:
The manifest above does not account for air-gapped environments
Change this to the registry where you replicated the image previously pulled for an air-gapped deployment. Alternatively, to test in an internet-connected environment, create an image pull secret using the license ID as username/password as in the earlier step for the docker CLI for testing. Keep in mind you will need the firewall rules open mentioned in the image pull steps
Use the following to deploy the containers and run inference requests:
Be aware that this is a multi-gigabyte image, so it may take some time to download.
Once the pod is up and running, you should expect to see something like the following:
Leave that running in the background, and up a new terminal session to execute a test request. In the next few sections, we’ll include examples of appropriate requests for the major Cohere models.
Example Usage
Here are the bash
commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Kubernetes.
Remember that this is only an illustrative deployment. Feel free to modify it as needed to accommodate your environment.
A Note on Air-gapped Environments
All images in the proxy.replicated.com
registry are available to pull and copy into an air-gapped environment. These can be pulled using the license ID and steps previously provided by Cohere.