The code examples in this section use the Cohere v1 API. The v2 API is not yet supported for cloud deployments and will be coming soon.
This document will guide you through enabling development teams to access Cohere’s offerings on Amazon SageMaker.
In order to successfully subscribe to Cohere’s offerings on Amazon SageMaker, the user will need the following Identity and Access Management (IAM) permissions:
These permissions allow a user to manage your organization’s Amazon SageMaker subscriptions. Learn more about managing Amazon’s IAM Permissions here. Contact your AWS administrator if you have questions about account permissions.
First, navigate to Cohere’s SageMaker Marketplace to view the available product offerings. Select the product offering to which you are interested in subscribing.
Next, explore the tools on the Product Detail page to evaluate how you want to configure your subscription. It contains information related to:
For any Cohere software version after 1.0.5 (or model version after 3.0.5), the parameter InferenceAmiVersion=al2-ami-sagemaker-inference-gpu-2 must be specified during endpoint configuration (as a variant option) to avoid deployment errors.
You can use this code to invoke Cohere’s embed model on Amazon SageMaker:
Note that we’ve released multimodal embeddings models that are able to handle images in addition to text. Find more information here.
You can use this code to invoke Cohere’s Command models on Amazon SageMaker:
You can use this code to invoke Cohere’s Rerank v4.0 model on Amazon SageMaker:
Cohere’s models are also available on Amazon SageMaker Jumpstart, which makes it easy to access the models with just a few clicks.
To access Cohere’s models on SageMaker Jumpstart, follow these steps:
Studio.Open Studio. If you don’t see this option, you first need to create a user profile.Prebuilt and automated solutions and select JumpStart.Notebooks tab, where you can launch the notebook in JupyterLab.If you have any questions about this process, reach out to support@cohere.com.
By default, SageMaker endpoints have a random routing strategy. This means that requests coming to the model endpoints are forwarded to the machine learning instances randomly, which can cause latency issues in applications focused on generative AI. In 2023, the SageMaker platform introduced a RoutingStrategy parameter allowing you to use the ‘least outstanding requests’ (LOR) approach to routing. With LOR, SageMaker monitors the load of the instances behind your endpoint as well as the models or inference components that are deployed on each instance, then optimally routes requests to the instance that is best suited to serve it.
LOR has shown an improvement in latency under various conditions, and you can find more details here.
With your selected configuration and Product ARN available, you now have everything you need to integrate with Cohere’s model offerings on SageMaker.
Cohere recommends your next step be to find the appropriate notebook in Cohere’s list of Amazon SageMaker notebooks, and follow the instructions there, or provide the link to Cohere’s SageMaker notebooks to your development team to implement. The notebooks are thorough, developer-centric guides that will enable your team to begin leveraging Cohere’s endpoints in production for live inference.
If you have further questions about subscribing or configuring Cohere’s product offerings on Amazon SageMaker, please contact our team at support+aws@cohere.com.