An Amazon SageMaker Setup Guide
Note
The code examples in this section use the Cohere v1 API. The v2 API is not yet supported for cloud deployments and will be coming soon.
In an effort to make our language-model capabilities more widely available, we’ve partnered with a few major platforms to create hosted versions of our offerings.
This document will guide you through enabling development teams to access Cohere’s offerings on Amazon SageMaker.
Prerequisites
In order to successfully subscribe to Cohere’s offerings on Amazon SageMaker, the user will need the following Identity and Access Management (IAM) permissions:
- AmazonSageMakerFullAccess
- aws-marketplace:ViewSubscriptions
- aws-marketplace:Subscribe
- aws-marketplace:Unsubscribe
These permissions allow a user to manage your organization’s Amazon SageMaker subscriptions. Learn more about managing Amazon’s IAM Permissions here. Contact your AWS administrator if you have questions about account permissions.
You’ll also need to install the AWS Python SDK and some related tooling. Run:
pip install cohere-aws
(orpip install --upgrade cohere-aws
if you want to upgrade to the most recent version of the SDK).
Cohere with Amazon SageMaker Setup
First, navigate to Cohere’s SageMaker Marketplace to view the available product offerings. Select the product offering to which you are interested in subscribing.
Next, explore the tools on the Product Detail page to evaluate how you want to configure your subscription. It contains information related to:
- Pricing: This section allows you to estimate the cost of running inference on different types of instances.
- Usage: This section contains the technical details around supported data formats for each model, and offers links to documentation and notebooks that will help developers scope out the effort required to integrate with Cohere’s models.
- Subscribing: This section will once again present you with both the pricing details and the EULA for final review before you accept the offer. This information is identical to the information on Product Detail page.
- Configuration: The primary goal of this section is to retrieve the Amazon Resource Name (ARN) for the product you have subscribed to.
Embeddings
You can use this code to invoke Cohere’s embed model on Amazon SageMaker:
Text Generation
You can use this code to invoke Cohere’s Command models on Amazon SageMaker:
Access Via Amazon SageMaker Jumpstart
Cohere’s models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks.
To access Cohere’s models on SageMaker Jumpstart, follow these steps:
- In the AWS Console, go to Amazon SageMaker and click
Studio
. - Then, click
Open Studio
. If you don’t see this option, you first need to create a user profile. - This will bring you to the SageMaker Studio page. Look for
Prebuilt and automated solutions
and selectJumpStart
. - A list of models will appear. To look for Cohere models, type “cohere” in the search bar.
- Select any Cohere model and you will find details about the model and links to further resources.
- You can try out the model by going to the
Notebooks
tab, where you can launch the notebook in JupyterLab.
If you have any questions about this process, reach out to support@cohere.com.
Optimize your Inference Latencies
By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a RoutingStrategy
parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.
LOR has shown an improvement in latency under various conditions, and you can find more details here.
Next Steps
With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.
Cohere recommends your next step be to find the appropriate notebook in Cohere’s list of Amazon SageMaker notebooks, and follow the instructions there, or provide the link to Cohere’s SageMaker notebooks to your development team to implement. The notebooks are thorough, developer-centric guides that will enable your team to begin leveraging Cohere’s endpoints in production for live inference.
If you have further questions about subscribing or configuring Cohere’s product offerings on Amazon SageMaker, please contact our team at support+aws@cohere.com.