Using Cohere models via the OpenAI SDK

The Compatibility API allows developers to use Cohere’s models through OpenAI’s SDK.

It makes it easy to switch existing OpenAI-based applications to use Cohere’s models while still maintaining the use of OpenAI SDK — no big refactors needed.

This is a quickstart guide to help you get started with the Compatibility API.

Installation

First, install the OpenAI SDK and import the package.

Then, create a client and configure it with the compatibility API base URL and your Cohere API key.

$pip install openai
PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)

Basic chat completions

Here’s a basic example of using the Chat Completions API.

PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)
7
8completion = client.chat.completions.create(
9 model="command-r7b-12-2024",
10 messages=[
11 {
12 "role": "user",
13 "content": "Write a haiku about recursion in programming.",
14 },
15 ],
16)
17
18print(completion.choices[0].message)

Example response (via the Python SDK):

1ChatCompletionMessage(content="Recursive loops,\nUnraveling code's depths,\nEndless, yet complete.", refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)

Chat with streaming

To stream the response, set the stream parameter to True.

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)
7
8stream = client.chat.completions.create(
9 model="command-r7b-12-2024",
10 messages=[
11 {
12 "role": "user",
13 "content": "Write a haiku about recursion in programming.",
14 },
15 ],
16 stream=True,
17)
18
19for chunk in stream:
20 print(chunk.choices[0].delta.content or "", end="")

Example response (via the Python SDK):

1Recursive call,
2Unraveling, line by line,
3Solving, then again.

State management

For state management, use the messages parameter to build the conversation history.

You can include a system message via the developer role and the multiple chat turns between the user and assistant.

PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)
7
8completion = client.chat.completions.create(
9 messages=[
10 {
11 "role": "developer",
12 "content": "You must respond in the style of a pirate.",
13 },
14 {
15 "role": "user",
16 "content": "What's 2 + 2.",
17 },
18 {
19 "role": "assistant",
20 "content": "Arrr, matey! 2 + 2 be 4, just like a doubloon in the sea!",
21 },
22 {
23 "role": "user",
24 "content": "Add 30 to that.",
25 },
26 ],
27 model="command-r7b-12-2024",
28)
29
30print(completion.choices[0].message)

Example response (via the Python SDK):

1ChatCompletionMessage(content='Aye aye, captain! 4 + 30 be 34, a treasure to behold!', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)

Structured outputs

The Structured Outputs feature allows you to specify the schema of the model response. It guarantees that the response will strictly follow the schema.

To use it, set the response_format parameter to the JSON Schema of the desired output.

PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)
7
8completion = client.beta.chat.completions.parse(
9 model="command-r7b-12-2024",
10 messages=[
11 {
12 "role": "user",
13 "content": "Generate a JSON describing a book.",
14 }
15 ],
16 response_format={
17 "type": "json_object",
18 "schema": {
19 "type": "object",
20 "properties": {
21 "title": {"type": "string"},
22 "author": {"type": "string"},
23 "publication_year": {"type": "integer"},
24 },
25 "required": ["title", "author", "publication_year"],
26 },
27 },
28)
29
30print(completion.choices[0].message.content)

Example response (via the Python SDK):

{
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"publication_year": 1925
}

Tool use (function calling)

You can utilize the tool use feature by passing a list of tools to the tools parameter in the API call.

Specifying the strict parameter to True in the tool calling step will guarantee that every generated tool call follows the specified tool schema.

PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key="COHERE_API_KEY",
6)
7
8tools = [
9 {
10 "type": "function",
11 "function": {
12 "name": "get_flight_info",
13 "description": "Get flight information between two cities or airports",
14 "parameters": {
15 "type": "object",
16 "properties": {
17 "loc_origin": {
18 "type": "string",
19 "description": "The departure airport, e.g. MIA",
20 },
21 "loc_destination": {
22 "type": "string",
23 "description": "The destination airport, e.g. NYC",
24 },
25 },
26 "required": ["loc_origin", "loc_destination"],
27 },
28 },
29 }
30]
31
32messages = [
33 {"role": "developer", "content": "Today is April 30th"},
34 {
35 "role": "user",
36 "content": "When is the next flight from Miami to Seattle?",
37 },
38 {
39 "role": "assistant",
40 "tool_calls": [
41 {
42 "function": {
43 "arguments": '{ "loc_destination": "Seattle", "loc_origin": "Miami" }',
44 "name": "get_flight_info",
45 },
46 "id": "get_flight_info0",
47 "type": "function",
48 }
49 ],
50 },
51 {
52 "role": "tool",
53 "name": "get_flight_info",
54 "tool_call_id": "get_flight_info0",
55 "content": "Miami to Seattle, May 1st, 10 AM.",
56 },
57]
58
59completion = client.chat.completions.create(
60 model="command-r7b-12-2024",
61 messages=messages,
62 tools=tools,
63 temperature=0.7,
64)
65
66print(completion.choices[0].message)

Example response (via the Python SDK):

1ChatCompletionMessage(content='The next flight from Miami to Seattle is on May 1st, 10 AM.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)

Embeddings

You can generate text embeddings Embeddings API by passing a list of strings as the input parameter. You can also specify in encoding_format the format of embeddings to be generated. Can be either float or base64.

PYTHON
1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.cohere.ai/compatibility/v1",
5 api_key=COHERE_API_KEY,
6)
7
8response = client.embeddings.create(
9 input=["Hello world!"],
10 model="embed-multilingual-v3.0",
11 encoding_format="float",
12)
13
14print(
15 response.data[0].embedding[:5]
16) # Display the first 5 dimensions

Example response (via the Python SDK):

1[0.0045051575, 0.046905518, 0.025543213, 0.009651184, -0.024993896]

Supported parameters

The following is the list supported parameters in the Compatibility API, including those that are not explicitly demonstrated in the examples above:

Chat completions

  • model
  • messages
  • stream
  • response_format
  • tools
  • temperature
  • max_tokens
  • stop
  • seed
  • top_p
  • frequency_penalty
  • presence_penalty

Embeddings

  • input
  • model
  • encoding_format

Unsupported parameters

The following parameters are not supported in the Compatibility API:

Chat completions

  • tool_choice
  • store
  • reasoning_effort
  • metadata
  • logit_bias
  • logprobs
  • top_logprobs
  • max_completion_tokens
  • n
  • modalities
  • prediction
  • audio
  • service_tier
  • stream_options
  • parallel_tool_calls
  • user

Embeddings

  • dimensions
  • user
Built with