AWS Private Deployment Guide (EC2 and EKS)
AWS Private Deployment Guide (EC2 and EKS)
AWS Private Deployment Guide (EC2 and EKS)
This guide walks you through the process of setting up a production-ready environment for deploying Cohere models in AWS.
Private deployment in AWS offers enhanced security, compliance, and control over your infrastructure and applications while leveraging AWS’s reliable and scalable cloud services.
This guide provides comprehensive instructions for deploying applications in a private AWS environment using EC2 instances and Amazon EKS (Elastic Kubernetes Service).
Note: This guide uses an example of deploying the Embed Multilingual 3 model. If you are deploying a different model, the instance sizing will differ – please contact sales for further information.
Before beginning this deployment, you should have:
Follow this guide sequentially to ensure all components are properly configured for a secure and efficient private deployment in AWS.
Amazon Elastic Compute Cloud (EC2) provides scalable computing capacity in the AWS cloud and forms the foundation of your private deployment.
In this section, we’ll walk through launching an appropriate GPU-enabled EC2 instance, connecting to it securely, and installing the necessary NVIDIA drivers to utilize the GPU hardware.
The following sections provide a step by step guide to deploy the Embed Multilingual 3 model on EC2.
First, launch an EC2 instance with the following specifications:
Next, connect to your EC2 instance using the “EC2 Instance Connect” option. This allows you to access the instance through a browser-based client using the default username “ubuntu.” Ensure your instance has a public IPv4 address for successful connection.
Next, install the NVIDIA drivers on your EC2 instance to enable GPU support. Use the following commands to install the necessary drivers and the NVIDIA CUDA toolkit.
Nvidia drivers
Nvidia container toolkit
Reboot the system This is often necessary after installing drivers or making significant system changes to ensure all components are properly initialized and running with the latest configurations.
Before rebooting, restart any mentioned services after running the above commands.
Verify that the GPU is correctly installed
Next, install Docker on your instance. This involves updating the package list, installing Docker, starting the Docker service, and verifying the installation by running a test container.
Next, prepare the environment by obtaining the required software components for deployment.
If you encounter an error “permission denied while trying to connect to the Docker daemon socket at”, run the following command:
Then, verify that the image has been pulled successfully:
Next, run the Docker container. This starts the container in detached mode with GPU support.
Next, test calling the model by executing the curl command to send a POST request to the local server. This tests the model’s functionality by providing input data for processing.
Amazon Elastic Kubernetes Service (EKS) provides a managed Kubernetes environment, allowing you to run containerized applications at scale. It leverages Kubernetes’ orchestration capabilities for efficient resource management and scalability.
In this section, we’ll walk through setting up an EKS cluster, configuring it for GPU support, and deploying your application.
First, launch an EKS cluster by following the steps in the AWS documentation, and in particular, the steps in Prerequisites and Step 1-4.
The steps in summary:
You can then view the cluster information in the AWS console.
Next, set environment variables from the machine where the AWS CLI and Kubectl are installed.
Next, generate an application manifest by creating a file named cohere.yaml. The file contents should be copied from this link.
Next, start the deployment by applying the configuration file. Then, check the status and monitor the logs of your pods.
Next, run the model by first setting up port forwarding.
Then, open a second window and send a test request using the curl command.