Cohere on the Microsoft Azure Platform

In this document, you learn how to use Azure AI Foundry to deploy the Cohere Command, Emebbing, and Rerank models on Microsoft’s Azure cloud computing platform. You can read more about Azure AI Foundry in its documentationhere.

The following models are available through Azure AI Foundry with pay-as-you-go, token-based billing:

Command A
Embed v4
Embed v3 - English
Embed v3 - Multilingual
Cohere Rerank V4.0 Pro
Cohere Rerank V4.0 Fast
Cohere Rerank V3.5
Cohere Rerank V3 (English)
Cohere Rerank V3 (multilingual)

Prerequisites

Whether you’re using Command, Embed, or Rerank, the initial set up is the same. You’ll need:

An Azure subscription with a valid payment method. Free or trial Azure subscriptions won’t work. If you don’t have an Azure subscription, create a paid Azure account to begin.
An Azure AI hub resource. Note: for Cohere models, the pay-as-you-go deployment offering is only available with AI hubs created in the East US, East US 2, North Central US, South Central US, Sweden Central, West US or West US 3 regions.
An Azure AI project in Azure AI Studio.
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Studio. To perform the required steps, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Studio.

For workflows based around Command, Embed, or Rerank, you’ll also need to create a deployment and consume the model. Here are links for more information:

Command: create a Command deployment and then consume the Command model.
Embed: create an Embed deployment and consume the Embed model.
Rerank: create a Rerank deployment and consume the Rerank model.

Text Generation

We expose two routes for Command R and Command R+ inference:

v1/chat/completions adheres to the Azure AI Generative Messages API schema;
v1/chat supports Cohere’s native API schema.

You can find more information about Azure’s API here.

Here’s a code snippet demonstrating how to programmatically interact with a Cohere model on Azure:

PYTHON

1 import urllib.request
2 import json
3 
4 # Configure payload data sending to API endpoint
5 data = {
6     "messages": [
7         {"role": "system", "content": "You are a helpful assistant."},
8         {"role": "user", "content": "What is good about Wuhan?"},
9     ],
10     "max_tokens": 500,
11     "temperature": 0.3,
12     "stream": "True",
13 }
14 
15 body = str.encode(json.dumps(data))
16 
17 # Replace the url with your API endpoint
18 url = (
19     "https://your-endpoint.inference.ai.azure.com/v1/chat/completions"
20 )
21 
22 # Replace this with the key for the endpoint
23 api_key = "your-auth-key"
24 if not api_key:
25     raise Exception("API Key is missing")
26 
27 headers = {
28     "Content-Type": "application/json",
29     "Authorization": (api_key),
30 }
31 
32 req = urllib.request.Request(url, body, headers)
33 
34 try:
35     response = urllib.request.urlopen(req)
36     result = response.read()
37     print(result)
38 except urllib.error.HTTPError as error:
39     print("The request failed with status code: " + str(error.code))
40     # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
41     print(error.info())
42     print(error.read().decode("utf8", "ignore"))

You can find more code snippets, including examples of how to stream responses, in this notebook.

Though this section is called “Text Generation”, it’s worth pointing out that these models are capable of much more. Specifically, you can use Azure-hosted Cohere models for both retrieval augmented generation and multi-step tool use. Check the linked pages for much more information.

Finally, we released refreshed versions of Command R and Command R+ in August 2024, both of which are now available on Azure. Check these Microsoft docs for more information (select the Cohere Command R 08-2024 or Cohere Command R+ 08-2024 tabs).

Embeddings

We expose two routes for Embed v4 and Embed v3 inference:

v1/embeddings adheres to the Azure AI Generative Messages API schema;
v1/embed supports Cohere’s native API schema.

You can find more information about Azure’s API here.

PYTHON

1 import urllib.request
2 import json
3 
4 # Configure payload data sending to API endpoint
5 data = {"input": ["hi"]}
6 
7 body = str.encode(json.dumps(data))
8 
9 # Replace the url with your API endpoint
10 url = "https://your-endpoint.inference.ai.azure.com/v1/embedding"
11 
12 # Replace this with the key for the endpoint
13 api_key = "your-auth-key"
14 if not api_key:
15     raise Exception("API Key is missing")
16 
17 headers = {
18     "Content-Type": "application/json",
19     "Authorization": (api_key),
20 }
21 
22 req = urllib.request.Request(url, body, headers)
23 
24 try:
25     response = urllib.request.urlopen(req)
26     result = response.read()
27     print(result)
28 except urllib.error.HTTPError as error:
29     print("The request failed with status code: " + str(error.code))
30     # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
31     print(error.info())
32     print(error.read().decode("utf8", "ignore"))

Rerank

We currently exposes the v1/rerank endpoint for inference with Rerank v4.0 Pro, Rerank v4.0 Fast, Rerank v3.5, Rerank v3 English, and Rerank 3 Multilingual. For more information on using the APIs, see the reference section.

PYTHON

1 import cohere
2 
3 co = cohere.Client(
4     base_url="https://<endpoint>.<region>.inference.ai.azure.com/v1/rerank",
5     api_key="<key>",
6 )
7 
8 documents = [
9     {
10         "Title": "Incorrect Password",
11         "Content": "Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?",
12     },
13     {
14         "Title": "Confirmation Email Missed",
15         "Content": "Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?",
16     },
17     {
18         "Title": "Questions about Return Policy",
19         "Content": "Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.",
20     },
21     {
22         "Title": "Customer Support is Busy",
23         "Content": "Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?",
24     },
25     {
26         "Title": "Received Wrong Item",
27         "Content": "Hi, I have a question about my recent order. I received the wrong item and I need to return it.",
28     },
29     {
30         "Title": "Customer Service is Unavailable",
31         "Content": "Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?",
32     },
33     {
34         "Title": "Return Policy for Defective Product",
35         "Content": "Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.",
36     },
37     {
38         "Title": "Wrong Item Received",
39         "Content": "Good morning, I have a question about my recent order. I received the wrong item and I need to return it.",
40     },
41     {
42         "Title": "Return Defective Product",
43         "Content": "Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective.",
44     },
45 ]
46 
47 response = co.rerank(
48     documents=documents,
49     query="What emails have been about returning items?",
50     model="rerank-v4.0-pro",
51     rank_fields=["Title", "Content"],
52     top_n=5,
53 )

Using the Cohere SDK

You can use the Cohere SDK client to consume Cohere models that are deployed via Azure AI Foundry. This means you can leverage the SDK’s features such as RAG, tool use, structured outputs, and more.

The following are a few examples on how to use the SDK for the different models.

Setup

PYTHON

1 # pip install cohere
2 
3 import cohere
4 
5 # For Command models
6 co_chat = cohere.Client(
7     api_key="AZURE_INFERENCE_CREDENTIAL",
8     base_url="AZURE_MODEL_ENDPOINT",  # Example - https://Cohere-command-r-plus-08-2024-xyz.eastus.models.ai.azure.com/
9 )
10 
11 # For Embed models
12 co_embed = cohere.Client(
13     api_key="AZURE_INFERENCE_CREDENTIAL",
14     base_url="AZURE_MODEL_ENDPOINT",  # Example - https://cohere-embed-v4-xyz.eastus.models.ai.azure.com/
15 )
16 
17 # For Rerank models
18 co_rerank = cohere.Client(
19     api_key="AZURE_INFERENCE_CREDENTIAL",
20     base_url="AZURE_MODEL_ENDPOINT",  # Example - https://cohere-rerank-v4-pro-xyz.eastus.models.ai.azure.com/
21 )

Chat

PYTHON

1 message = "I'm joining a new startup called Co1t today. Could you help me write a short introduction message to my teammates."
2 
3 response = co_chat.chat(message=message)
4 
5 print(response)

RAG

PYTHON

1 faqs_short = [
2     {
3         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
4     },
5     {
6         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
7     },
8 ]
9 
10 query = "Are there fitness-related perks?"
11 
12 response = co_chat.chat(message=query, documents=faqs_short)
13 
14 print(response)

Embed

PYTHON

1 docs = [
2     "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.",
3     "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee.",
4 ]
5 
6 doc_emb = co_embed.embed(
7     input_type="search_document",
8     texts=docs,
9 ).embeddings

Rerank

PYTHON

1 faqs_short = [
2     {
3         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
4     },
5     {
6         "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
7     },
8     {
9         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
10     },
11 ]
12 
13 query = "Are there fitness-related perks?"
14 
15 results = co_rerank.rerank(
16     query=query,
17     documents=faqs_short,
18     top_n=2,
19     model="rerank-v4.0-pro",
20 )

Here are some other examples for Command and Embed.

The important thing to understand is that our new and existing customers can call the models from Azure while still leveraging their integration with the Cohere SDK.