๐Ÿš€ New multimodal model: Command A Vision! (Learn more) ๐Ÿš€

Integrations

LlamaIndex and Cohere's Models

Prerequisite

To use LlamaIndex and Cohere, you will need:

  • LlamaIndex Package. To install it, run:
    • pip install llama-index
    • pip install llama-index-llms-cohere (to use the Command models)
    • pip install llama-index-embeddings-cohere (to use the Embed models)
    • pip install llama-index-postprocessor-cohere-rerank (to use the Rerank models)
  • Cohereโ€™s SDK. To install it, run pip install cohere. If you run into any issues or want more details on Cohereโ€™s SDK, see this wiki.
  • A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and itโ€™s in the dashboard section called โ€œAPI Keysโ€ as well.

Cohere Chat with LlamaIndex

To use Cohereโ€™s chat functionality with LlamaIndex create a Cohere model object and call the chat function.

PYTHON
1from llama_index.llms.cohere import Cohere
2from llama_index.core.llms import ChatMessage
3
4cohere_model = Cohere(
5 api_key="COHERE_API_KEY", model="command-a-03-2025"
6)
7
8message = ChatMessage(role="user", content="What is 2 + 3?")
9
10response = cohere_model.chat([message])
11print(response)

Cohere Embeddings with LlamaIndex

To use Cohereโ€™s embeddings with LlamaIndex create a Cohere Embeddings object with an embedding model from this list and call get_text_embedding.

PYTHON
1from llama_index.embeddings.cohere import CohereEmbedding
2
3embed_model = CohereEmbedding(
4 api_key="COHERE_API_KEY",
5 model_name="embed-english-v3.0",
6 input_type="search_document", # Use search_query for queries, search_document for documents
7 max_tokens=8000,
8 embedding_types=["float"],
9)
10
11# Generate Embeddings
12embeddings = embed_model.get_text_embedding("Welcome to Cohere!")
13
14# Print embeddings
15print(len(embeddings))
16print(embeddings[:5])

Cohere Rerank with LlamaIndex

To use Cohereโ€™s rerank functionality with LlamaIndex create a Cohere Rerank object and use as a node post processor.

PYTHON
1from llama_index.postprocessor.cohere_rerank import CohereRerank
2from llama_index.readers.web import (
3 SimpleWebPageReader,
4) # first, run `pip install llama-index-readers-web`
5
6# create index (we are using an example page from Cohere's docs)
7documents = SimpleWebPageReader(html_to_text=True).load_data(
8 ["https://docs.cohere.com/v2/docs/cohere-embed"]
9) # you can replace this with any other reader or documents
10index = VectorStoreIndex.from_documents(documents=documents)
11
12# create reranker
13cohere_rerank = CohereRerank(
14 api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
15)
16
17# query the index
18query_engine = index.as_query_engine(
19 similarity_top_k=10,
20 node_postprocessors=[cohere_rerank],
21)
22
23print(query_engine)
24
25# generate a response
26response = query_engine.query(
27 "What is Cohere's Embed Model?",
28)
29
30print(response)
31
32# To view the source documents
33from llama_index.core.response.pprint_utils import pprint_response
34
35pprint_response(response, show_source=True)

Cohere RAG with LlamaIndex

The following example uses Cohereโ€™s chat model, embeddings and rerank functionality to generate a response based on your data.

PYTHON
1from llama_index.llms.cohere import Cohere
2from llama_index.embeddings.cohere import CohereEmbedding
3from llama_index.postprocessor.cohere_rerank import CohereRerank
4from llama_index.core import Settings
5from llama_index.core import VectorStoreIndex
6from llama_index.readers.web import (
7 SimpleWebPageReader,
8) # first, run `pip install llama-index-readers-web`
9
10# Create the embedding model
11embed_model = CohereEmbedding(
12 api_key="COHERE_API_KEY",
13 model="embed-english-v3.0",
14 input_type="search_document",
15 max_tokens=8000,
16 embedding_types=["float"],
17)
18
19# Create the service context with the cohere model for generation and embedding model
20Settings.llm = Cohere(
21 api_key="COHERE_API_KEY", model="command-a-03-2025"
22)
23Settings.embed_model = embed_model
24
25# create index (we are using an example page from Cohere's docs)
26documents = SimpleWebPageReader(html_to_text=True).load_data(
27 ["https://docs.cohere.com/v2/docs/cohere-embed"]
28) # you can replace this with any other reader or documents
29index = VectorStoreIndex.from_documents(documents=documents)
30
31# Create a cohere reranker
32cohere_rerank = CohereRerank(
33 api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
34)
35
36# Create the query engine
37query_engine = index.as_query_engine(
38 node_postprocessors=[cohere_rerank]
39)
40
41# Generate the response
42response = query_engine.query("What is Cohere's Embed model?")
43print(response)

Cohere Tool Use (Function Calling) with LlamaIndex

To use Cohereโ€™s tool use functionality with LlamaIndex, you can use the FunctionTool class to create a tool that uses Cohereโ€™s API.

PYTHON
1from llama_index.llms.cohere import Cohere
2from llama_index.core.tools import FunctionTool
3from llama_index.core.agent import FunctionCallingAgent
4
5
6# Define tools
7def multiply(a: int, b: int) -> int:
8 """Multiple two integers and returns the result integer"""
9 return a * b
10
11
12multiply_tool = FunctionTool.from_defaults(fn=multiply)
13
14
15def add(a: int, b: int) -> int:
16 """Add two integers and returns the result integer"""
17 return a + b
18
19
20add_tool = FunctionTool.from_defaults(fn=add)
21
22# Define LLM
23llm = Cohere(api_key="COHERE_API_KEY", model="command-a-03-2025")
24
25# Create agent
26agent = FunctionCallingAgent.from_tools(
27 [multiply_tool, add_tool],
28 llm=llm,
29 verbose=True,
30 allow_parallel_tool_calls=True,
31)
32
33# Run agent
34response = await agent.achat("What is (121 * 3) + (5 * 8)?")