An Amazon SageMaker Setup Guide

Note

The code examples in this section use the Cohere v1 API. The v2 API is not yet supported for cloud deployments and will be coming soon.

In an effort to make our language-model capabilities more widely available, we’ve partnered with a few major platforms to create hosted versions of our offerings.

This document will guide you through enabling development teams to access Cohere’s offerings on Amazon SageMaker.

Prerequisites

In order to successfully subscribe to Cohere’s offerings on Amazon SageMaker, the user will need the following Identity and Access Management (IAM) permissions:

AmazonSageMakerFullAccess
aws-marketplace:ViewSubscriptions
aws-marketplace:Subscribe
aws-marketplace:Unsubscribe

These permissions allow a user to manage your organization’s Amazon SageMaker subscriptions. Learn more about managing Amazon’s IAM Permissions here. Contact your AWS administrator if you have questions about account permissions.

Cohere with Amazon SageMaker Setup

First, navigate to Cohere’s SageMaker Marketplace to view the available product offerings. Select the product offering to which you are interested in subscribing.

Next, explore the tools on the Product Detail page to evaluate how you want to configure your subscription. It contains information related to:

Pricing: This section allows you to estimate the cost of running inference on different types of instances.
Usage: This section contains the technical details around supported data formats for each model, and offers links to documentation and notebooks that will help developers scope out the effort required to integrate with Cohere’s models.
Subscribing: This section will once again present you with both the pricing details and the EULA for final review before you accept the offer. This information is identical to the information on Product Detail page.
Configuration: The primary goal of this section is to retrieve the Amazon Resource Name (ARN) for the product you have subscribed to.

For any Cohere software version after 1.0.5 (or model version after 3.0.5), the parameter InferenceAmiVersion=al2-ami-sagemaker-inference-gpu-2 must be specified during endpoint configuration (as a variant option) to avoid deployment errors.

Embeddings

You can use this code to invoke Cohere’s embed model on Amazon SageMaker:

PYTHON

1 import cohere
2 
3 co = cohere.SagemakerClient(
4     aws_region="us-east-1",
5     aws_access_key="...",
6     aws_secret_key="...",
7     aws_session_token="...",
8 )
9 
10 # Input parameters for embed. In this example we are embedding hacker news post titles.
11 texts = [
12     "Interesting (Non software) books?",
13     "Non-tech books that have helped you grow professionally?",
14     "I sold my company last month for $5m. What do I do with the money?",
15     "How are you getting through (and back from) burning out?",
16     "I made $24k over the last month. Now what?",
17     "What kind of personal financial investment do you do?",
18     "Should I quit the field of software development?",
19 ]
20 input_type = "clustering"
21 truncate = "NONE"  # optional
22 model_id = "<YOUR ENDPOINT NAME>"  # On SageMaker, you create a model name that you'll pass here.
23 
24 
25 # Invoke the model and print the response
26 result = co.embed(
27     model=model_id,
28     input_type=input_type,
29     texts=texts,
30     truncate=truncate,
31 )
32 
33 print(result)

Cohere’s embed models don’t support batch transform operations.

Note that we’ve released multimodal embeddings models that are able to handle images in addition to text. Find more information here.

Text Generation

You can use this code to invoke Cohere’s Command models on Amazon SageMaker:

PYTHON

1 import cohere
2 
3 co = cohere.SagemakerClient(
4     aws_region="us-east-1",
5     aws_access_key="...",
6     aws_secret_key="...",
7     aws_session_token="...",
8 )
9 
10 # Invoke the model and print the response
11 result = co.chat(message="Write a LinkedIn post about starting a career in tech:",
12                  model="<YOUR ENDPOINT NAME>") # On SageMaker, you create a model name that you'll pass here. 
13 
14 print(result)

Access Via Amazon SageMaker Jumpstart

Cohere’s models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks.

To access Cohere’s models on SageMaker Jumpstart, follow these steps:

In the AWS Console, go to Amazon SageMaker and click Studio.
Then, click Open Studio. If you don’t see this option, you first need to create a user profile.
This will bring you to the SageMaker Studio page. Look for Prebuilt and automated solutions and select JumpStart.
A list of models will appear. To look for Cohere models, type “cohere” in the search bar.
Select any Cohere model and you will find details about the model and links to further resources.
You can try out the model by going to the Notebooks tab, where you can launch the notebook in JupyterLab.

If you have any questions about this process, reach out to support@cohere.com.

Optimize your Inference Latencies

By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a RoutingStrategy parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.

LOR has shown an improvement in latency under various conditions, and you can find more details here.

Next Steps

With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.

Cohere recommends your next step be to find the appropriate notebook in Cohere’s list of Amazon SageMaker notebooks, and follow the instructions there, or provide the link to Cohere’s SageMaker notebooks to your development team to implement. The notebooks are thorough, developer-centric guides that will enable your team to begin leveraging Cohere’s endpoints in production for live inference.

If you have further questions about subscribing or configuring Cohere’s product offerings on Amazon SageMaker, please contact our team at support+aws@cohere.com.