Cohere on Azure
In an effort to make our language-model capabilities more widely available, we’ve partnered with a few major platforms to create hosted versions of our offerings.
In this article, you learn how to use Azure AI Foundry to deploy both the Cohere Command models and the Cohere Embed models on Microsoft’s Azure cloud computing platform. You can read more about Azure AI Foundry in its documentationhere.
The following six models are available through Azure AI Studio with pay-as-you-go, token-based billing:
- Command R (and the refreshed Command R model)
- Command R+ (and the refreshed Command R+ model)
- Embed v3 - English
- Embed v3 - Multilingual
- Cohere Rerank V3 (English)
- Cohere Rerank V3 (Multilingual)
Prerequisites
Whether you’re using Command or Embed, the initial set up is the same. You’ll need:
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won’t work. If you don’t have an Azure subscription, create a paid Azure account to begin.
- An Azure AI hub resource. Note: for Cohere models, the pay-as-you-go deployment offering is only available with AI hubs created in the
EastUS
,EastUS2
orSweden Central
regions. - An Azure AI project in Azure AI Studio.
- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the required steps, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Studio.
For workflows based around Command, Embed, or Rerank, you’ll also need to create a deployment and consume the model. Here are links for more information:
- Command: create a Command deployment and then consume the Command model.
- Embed: create an Embed deployment and consume the Embed model.
- Rerank: create a Rerank deployment and consume the Rerank model.
Text Generation
We expose two routes for Command R and Command R+ inference:
v1/chat/completions
adheres to the Azure AI Generative Messages API schema;v1/chat
supports Cohere’s native API schema.
You can find more information about Azure’s API here.
Here’s a code snippet demonstrating how to programmatically interact with a Cohere model on Azure:
You can find more code snippets, including examples of how to stream responses, in this notebook.
Though this section is called “Text Generation”, it’s worth pointing out that these models are capable of much more. Specifically, you can use Azure-hosted Cohere models for both retrieval augmented generation and multi-step tool use. Check the linked pages for much more information.
Finally, we released refreshed versions of Command R and Command R+ in August 2024, both of which are now available on Azure. Check these Microsoft docs for more information (select the Cohere Command R 08-2024 or Cohere Command R+ 08-2024 tabs).
Embeddings
We expose two routes for Embed v3 - English and Embed v3 - Multilingual inference:
v1/embeddings
adheres to the Azure AI Generative Messages API schema;- Use
v1/images/embeddings
if you want to use one of our multimodal embeddings models.
- Use
v1/embed
supports Cohere’s native API schema.
You can find more information about Azure’s API here.
Rerank
We currently exposes the v1/rerank
endpoint for inference with both Rerank 3 - English and Rerank 3 - Multilingual. For more information on using the APIs, see the reference section.
Using the Cohere SDK
You can use the Cohere SDK client to consume Cohere models that are deployed via Azure AI Foundry. This means you can leverage the SDK’s features such as RAG, tool use, structured outputs, and more.
The following are a few examples on how to use the SDK for the different models.
Setup
Chat
RAG
Embed
Rerank
Here are some other examples for Command and Embed.
The important thing to understand is that our new and existing customers can call the models from Azure while still leveraging their integration with the Cohere SDK.