Single Container on Private Clouds

This document walks through how to pull Cohere's container images using a license, and provides steps for testing both Docker and Kubernetes images.

Before starting, ensure you have a license and image tag provided by Cohere.

Pull Container Images with A License

Cohere provides access to container images through a registry authenticated with a license. Users can pull these images and replicate them in their environment, as needed, to avoid runtime network access from inside the cluster.

Images will come through the proxy.replicated.com registry. Pulling the images will require firewall access open to proxy.replicated.com and proxy-auth.replicated.com. More information on these endpoints may be found here.

To test pulling images with a license, modify your docker CLI configuration to include authentication details for the registry. Note: docker login will not work.

The docker CLI is only an example; any tool which can pull images with credentials will work with the license ID configured as both username and password. Skopeo is another popular tool for copying images between registries which will work with this flow.

🚧

The following commands will overwrite your existing docker CLI configuration with authentication details for Cohere’s registry. If preferred, you can manually add the authentication details to preserve your existing configuration.

LICENSE_ID="<YOUR LICENSE ID>"

cat <<EOF > ~/.docker/config.json 
{
    "auths": {
        "proxy.replicated.com": {
            "auth": "$(echo -n "${LICENSE_ID}:${LICENSE_ID}" | base64 | tr -d '\n')"
        }
    }
}
EOF

Validate that the authenticated image pull works correctly using the docker CLI:

CUSTOMER_TAG=image_tag_from_cohere # provided by Cohere
docker pull $CUSTOMER_TAG

You can now re-tag and replicate this image anywhere you want, using workflows appropriate to your air-gapped environment.

Validate Workload Infrastructure

Once you can pull the image from the registry, run a test workload to validate the container's functionality.

Docker/Containerd

To test the container image with Docker, you should have a machine with the following installed:

  • Nvidia drivers installed on host (the latest tested version is 545).
  • nvidia-container-toolkit and corresponding configuration for docker/containerd.

Example Usage

Different models have different inputs.

  • Embed models expect an array of texts and return the embeddings as output.
  • Rerank models expect a list of documents and a query, returning relevance scores for the top n results (the n parameter is configurable).
  • Command models expect a prompt and return the model response.

This section provides simple examples of using each primary Cohere model in a Docker container. Note that if you try these out and get an error like curl: (7) Failed to connect to localhost port 8080: Connection refused, the container has not yet fully started up. Wait a few more seconds and then try again.

Bash Commands for Running Cohere Models Through Docker

Here are the bash commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Docker.

docker run -d --rm --name embed-english --gpus=1 --net=host $IMAGE_TAG

# wait 5-10 seconds for the container to start
# can check `curl http://localhost:8080/ping` for readiness
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing embeddings in english"], "input_type": "classification"}'

{"id":"2ffe4bca-8664-4456-b858-1b3b15411f2c","embeddings":[[-0.5019531,-2.0917969,-1.6220703,-1.2919922,-0.80029297,1.3173828,1.4677734,-1.7763672,0.03869629,1.9033203...}

docker stop embed-english
docker run -d --rm --name multilingual--gpus=1 --net=host $IMAGE_TAG

curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing multilingual embeddings"], "input_type": "classification"}'

{"id":"2eab88e7-5906-44e1-9644-01893a70f1e7","texts":["testing multilingual embeddings"],"embeddings":[[-0.022094727,-0.0121154785,0.037628174,-0.0026988983,-0.0129776,0.013305664,0.005458832,-0.03161621,-0.019744873,-0.026290894,0.017333984,-0.02444458,0.01953125...

docker stop multilingual
docker run -d --rm --name rerank-english --gpus=1 --net=host $IMAGE_TAG

curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'

{"id":"a547bcc5-a243-42dd-8617-d12a7944c164","results":[{"index":1,"relevance_score":0.9734939},{"index":2,"relevance_score":0.73772544}]}

docker stop rerank-english
docker run -d --rm --name rerank-multilingual --gpus=1 --net=host $IMAGE_TAG

curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'

{"id":"8abeacf2-e657-415c-bab3-ac593e67e8e5","results":[{"index":1,"relevance_score":0.6124835},{"index":2,"relevance_score":0.5305253}],"meta":{"api_version":{"version":"2022-12-06"},"billed_units":{"search_units":1}}}

docker stop rerank-multilingual
docker run -d --rm --name command --gpus=4 --net=host $IMAGE_TAG

curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{"query":"Docker is good because"}'

{
  "response_id": "dc182f8d-2db1-4b13-806c-e1bcea17f864",
  "text": "Docker is a powerful tool for developing,..."
  ...
}

curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
    "chat_history": [
        {"role": "USER", "message": "Who discovered gravity?"},
        {"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
    ],
    "message": "What year was he born?"
}'

{
  "response_id": "7938d788-f800-4f9b-a12c-72a96b76a6d6",
  "text": "Sir Isaac Newton was born in Woolsthorpe, England, on January 4, 1643. He was an English physicist, mathematician, astronomer, and natural philosopher who is widely recognized as one of the most...",
  ...
}

curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
    "message": "tell me about penguins",
    "return_chatlog": true,
    "documents": [
      {
        "title": "Tall penguins",
        "snippet": "Emperor penguins are the tallest",
        "url": "http://example.com/foo"
      },
      {
        "title": "Tall penguins",
        "snippet": "Baby penguins are the tallest",
        "url": "https://example.com/foo"
      }
    ],
    "mode": "augmented_generation"
}'

{
  "response_id": "8a9f55f6-26aa-455e-bc4c-3e93d4b0d9e6",
  "text": "Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins?",
  "generation_id": "65ef2270-46bb-427d-b54c-2e5f4d7daa90",
  "chatlog": "User: tell me about penguins\nChatbot: Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins? ",
  "token_count": {
    "prompt_tokens": 435,
    "response_tokens": 68,
    "total_tokens": 503
  },
  "meta": {
    "api_version": {
      "version": "2022-12-06"
    },
    "billed_units": {
      "input_tokens": 4,
      "output_tokens": 68
    }
  },
  "citations": [
    {
      "start": 15,
      "end": 40,
      "text": "group of flightless birds",
      "document_ids": [
        "doc_1"
      ]
    },
    {
      "start": 58,
      "end": 78,
      "text": "Southern Hemisphere.",
      "document_ids": [
        "doc_1"
      ]
    },
    {
      "start": 137,
      "end": 152,
      "text": "Emperor penguin",
      "document_ids": [
        "doc_0"
      ]
    },
    {
      "start": 167,
      "end": 174,
      "text": "tallest",
      "document_ids": [
        "doc_0"
      ]
    },
    {
      "start": 238,
      "end": 265,
      "text": "tallest species of penguin.",
      "document_ids": [
        "doc_1"
      ]
    }
  ],
  "documents": [
    {
      "id": "doc_1",
      "snippet": "Baby penguins are the tallest",
      "title": "Tall penguins",
      "url": "https://example.com/foo"
    },
    {
      "id": "doc_0",
      "snippet": "Emperor penguins are the tallest",
      "title": "Tall penguins",
      "url": "http://example.com/foo"
    }
  ]
}

docker stop command

You'll note that final example includes documents that the Command model can use to ground its replies. This functionality falls under retrieval augmented generation.

Kubernetes

Deploying to Kubernetes requires nodes with the following installed:

To deploy the same image on Kubernetes, we must first convert the docker configuration into an image pull secret (see the Kubernetes documentation for more detail).

kubectl create secret generic cohere-pull-secret \
    --from-file=.dockerconfigjson="~/.docker/config.json" \
    --type=kubernetes.io/dockerconfigjson

With that done, fill in the environment variables and generate the application manifest:

APP=cohere # or any other name you want to use
IMAGE= <IMAGE_TAG_FROM_COHERE> # replace with the image cohere provided
GPUS=4 # use 4 GPUs for command, 1 is enough for embed / rerank 

cat <<EOF > cohere.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ${APP}
  name: ${APP}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ${APP}
  strategy: {}
  template:
    metadata:
      labels:
        app: ${APP}
    spec:
      imagePullSecrets:
        - name: cohere-pull-secret
      containers:
      - image: ${IMAGE}
        name: ${APP}
        resources:
          limits:
            nvidia.com/gpu: ${GPUS}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: ${APP}
  name: ${APP}
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: ${APP}
  type: ClusterIP
---
EOF

🚧

The manifest above does not account for air-gapped environments

Change this to the registry where you replicated the image previously pulled for an air-gapped deployment. Alternatively, to test in an internet-connected environment, create an image pull secret using the license ID as username/password as in the earlier step for the docker CLI for testing. Keep in mind you will need the firewall rules open mentioned in the image pull steps

Use the following to deploy the containers and run inference requests:

kubectl apply -f cohere.yaml

Be aware that this is a multi-gigabyte image, so it may take some time to download.

Once the pod is up and running, you should expect to see something like the following:

# once the pod is running
kubectl port-forward svc/${APP} 8080:8080

# Forwarding from 127.0.0.1:8080 -> 8080
# Forwarding from [::1]:8080 -> 8080
# Handling connection for 8080

Leave that running in the background, and up a new terminal session to execute a test request. In the next few sections, we'll include examples of appropriate requests for the major Cohere models.

Example Usage

Here are the bash commands you can run to use the Embed English, Embed Multilingual, Rerank English, Rerank Multilingual, and Command models through Kubernetes.

curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing embeddings in english"], "input_type": "classification"}'

# {"id":"2ffe4bca-8664-4456-b858-1b3b15411f2c","embeddings":[[-0.5019531,-2.0917969,-1.6220703,-1.2919922,-0.80029297,1.3173828,1.4677734,-1.7763672,0.03869629,1.9033203...}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/embed --data-raw '{"texts": ["testing multilingual embeddings"], "input_type": "classification"}'

# {"id":"2eab88e7-5906-44e1-9644-01893a70f1e7","texts":["testing multilingual embeddings"],"embeddings":[[-0.022094727,-0.0121154785,0.037628174,-0.0026988983,-0.0129776,0.013305664,0.005458832,-0.03161621,-0.019744873,-0.026290894,0.017333984,-0.02444458,0.01953125...
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'

# {"id":"a547bcc5-a243-42dd-8617-d12a7944c164","results":[{"index":1,"relevance_score":0.9734939},{"index":2,"relevance_score":0.73772544}]}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/rerank --data-raw '{"documents": [{"text": "orange"},{"text": "Ottawa"},{"text": "Toronto"},{"text": "Ontario"}],"query": "what is the capital of Canada","top_n": 2}'

# {"id":"8abeacf2-e657-415c-bab3-ac593e67e8e5","results":[{"index":1,"relevance_score":0.6124835},{"index":2,"relevance_score":0.5305253}],"meta":{"api_version":{"version":"2022-12-06"},"billed_units":{"search_units":1}}}
curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{"query":"Docker is good because"}'

{
  "response_id": "dc182f8d-2db1-4b13-806c-e1bcea17f864",
  "text": "Docker is a powerful tool for developing,..."
  ...
}

curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
    "chat_history": [
        {"role": "USER", "message": "Who discovered gravity?"},
        {"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
    ],
    "message": "What year was he born?"
}'

{
  "response_id": "7938d788-f800-4f9b-a12c-72a96b76a6d6",
  "text": "Sir Isaac Newton was born in Woolsthorpe, England, on January 4, 1643. He was an English physicist, mathematician, astronomer, and natural philosopher who is widely recognized as one of the most...",
  ...
}

curl --header "Content-Type: application/json" --request POST http://localhost:8080/chat --data-raw '{
    "message": "tell me about penguins",
    "return_chatlog": true,
    "documents": [
      {
        "title": "Tall penguins",
        "snippet": "Emperor penguins are the tallest",
        "url": "http://example.com/foo"
      },
      {
        "title": "Tall penguins",
        "snippet": "Baby penguins are the tallest",
        "url": "https://example.com/foo"
      }
    ],
    "mode": "augmented_generation"
}'

{
  "response_id": "8a9f55f6-26aa-455e-bc4c-3e93d4b0d9e6",
  "text": "Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins?",
  "generation_id": "65ef2270-46bb-427d-b54c-2e5f4d7daa90",
  "chatlog": "User: tell me about penguins\nChatbot: Penguins are a group of flightless birds that live in the Southern Hemisphere. There are many different types of penguins, including the Emperor penguin, which is the tallest of the penguin species. Baby penguins are also known to be the tallest species of penguin. \n\nWould you like to know more about the different types of penguins? ",
  "token_count": {
    "prompt_tokens": 435,
    "response_tokens": 68,
    "total_tokens": 503
  },
  "meta": {
    "api_version": {
      "version": "2022-12-06"
    },
    "billed_units": {
      "input_tokens": 4,
      "output_tokens": 68
    }
  },
  "citations": [
    {
      "start": 15,
      "end": 40,
      "text": "group of flightless birds",
      "document_ids": [
        "doc_1"
      ]
    },
    {
      "start": 58,
      "end": 78,
      "text": "Southern Hemisphere.",
      "document_ids": [
        "doc_1"
      ]
    },
    {
      "start": 137,
      "end": 152,
      "text": "Emperor penguin",
      "document_ids": [
        "doc_0"
      ]
    },
    {
      "start": 167,
      "end": 174,
      "text": "tallest",
      "document_ids": [
        "doc_0"
      ]
    },
    {
      "start": 238,
      "end": 265,
      "text": "tallest species of penguin.",
      "document_ids": [
        "doc_1"
      ]
    }
  ],
  "documents": [
    {
      "id": "doc_1",
      "snippet": "Baby penguins are the tallest",
      "title": "Tall penguins",
      "url": "https://example.com/foo"
    },
    {
      "id": "doc_0",
      "snippet": "Emperor penguins are the tallest",
      "title": "Tall penguins",
      "url": "http://example.com/foo"
    }
  ]
}

Remember that this is only an illustrative deployment. Feel free to modify it as needed to accommodate your environment.

A Note on Air-gapped Environments

All images in the proxy.replicated.com registry are available to pull and copy into an air-gapped environment. These can be pulled using the license ID and steps previously provided by Cohere.