> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.cohere.com/llms.txt.
> For full documentation content, see https://docs.cohere.com/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.cohere.com/_mcp/server.

# LlamaIndex and Cohere's Models

> Learn how to use Cohere and LlamaIndex together to generate responses based on data.

### Prerequisite

To use LlamaIndex and Cohere, you will need:

* LlamaIndex Package. To install it, run:
  * `pip install llama-index`
  * `pip install llama-index-llms-cohere` (to use the Command models)
  * `pip install llama-index-embeddings-cohere` (to use the Embed models)
  * `pip install llama-index-postprocessor-cohere-rerank` (to use the Rerank models)
* Cohere's SDK. To install it, run `pip install cohere`. If you run into any issues or want more details on Cohere's SDK, [see this wiki](https://github.com/cohere-ai/cohere-python).
* A Cohere API Key. For more details on pricing [see this page](https://cohere.com/pricing). When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it's in the dashboard section called "API Keys" as well.

### Cohere Chat with LlamaIndex

To use Cohere's chat functionality with LlamaIndex create a [Cohere model object](https://docs.llamaindex.ai/en/stable/examples/llm/cohere.html) and call the `chat` function.

```python PYTHON
from llama_index.llms.cohere import Cohere
from llama_index.core.llms import ChatMessage

cohere_model = Cohere(
    api_key="COHERE_API_KEY", model="command-a-03-2025"
)

message = ChatMessage(role="user", content="What is 2 + 3?")

response = cohere_model.chat([message])
print(response)
```

### Cohere Embeddings with LlamaIndex

To use Cohere's embeddings with LlamaIndex create a [Cohere Embeddings object](https://docs.llamaindex.ai/en/stable/examples/embeddings/cohereai.html) with an embedding model [from this list](/reference/embed) and call `get_text_embedding`.

```python PYTHON
from llama_index.embeddings.cohere import CohereEmbedding

embed_model = CohereEmbedding(
    api_key="COHERE_API_KEY",
    model_name="embed-english-v3.0",
    input_type="search_document",  # Use search_query for queries, search_document for documents
    max_tokens=8000,
    embedding_types=["float"],
)

# Generate Embeddings
embeddings = embed_model.get_text_embedding("Welcome to Cohere!")

# Print embeddings
print(len(embeddings))
print(embeddings[:5])
```

### Cohere Rerank with LlamaIndex

To use Cohere's rerank functionality with LlamaIndex create a [ Cohere Rerank object ](https://docs.llamaindex.ai/en/latest/examples/node_postprocessor/CohereRerank.html#) and use as a [node post processor.](https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/root.html)

```python PYTHON
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.readers.web import (
    SimpleWebPageReader,
)  # first, run `pip install llama-index-readers-web`

# create index (we are using an example page from Cohere's docs)
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://docs.cohere.com/v2/docs/cohere-embed"]
)  # you can replace this with any other reader or documents
index = VectorStoreIndex.from_documents(documents=documents)

# create reranker
cohere_rerank = CohereRerank(
    api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
)

# query the index
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[cohere_rerank],
)

print(query_engine)

# generate a response
response = query_engine.query(
    "What is Cohere's Embed Model?",
)

print(response)

# To view the source documents
from llama_index.core.response.pprint_utils import pprint_response

pprint_response(response, show_source=True)
```

### Cohere RAG with LlamaIndex

The following example uses Cohere's chat model, embeddings and rerank functionality to generate a response based on your data.

```python PYTHON
from llama_index.llms.cohere import Cohere
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import (
    SimpleWebPageReader,
)  # first, run `pip install llama-index-readers-web`

# Create the embedding model
embed_model = CohereEmbedding(
    api_key="COHERE_API_KEY",
    model="embed-english-v3.0",
    input_type="search_document",
    max_tokens=8000,
    embedding_types=["float"],
)

# Create the service context with the cohere model for generation and embedding model
Settings.llm = Cohere(
    api_key="COHERE_API_KEY", model="command-a-03-2025"
)
Settings.embed_model = embed_model

# create index (we are using an example page from Cohere's docs)
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://docs.cohere.com/v2/docs/cohere-embed"]
)  # you can replace this with any other reader or documents
index = VectorStoreIndex.from_documents(documents=documents)

# Create a cohere reranker
cohere_rerank = CohereRerank(
    api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
)

# Create the query engine
query_engine = index.as_query_engine(
    node_postprocessors=[cohere_rerank]
)

# Generate the response
response = query_engine.query("What is Cohere's Embed model?")
print(response)
```

### Cohere Tool Use (Function Calling) with LlamaIndex

To use Cohere's tool use functionality with LlamaIndex, you can use the `FunctionTool` class to create a tool that uses Cohere's API.

```python PYTHON
from llama_index.llms.cohere import Cohere
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import FunctionCallingAgent


# Define tools
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


multiply_tool = FunctionTool.from_defaults(fn=multiply)


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


add_tool = FunctionTool.from_defaults(fn=add)

# Define LLM
llm = Cohere(api_key="COHERE_API_KEY", model="command-a-03-2025")

# Create agent
agent = FunctionCallingAgent.from_tools(
    [multiply_tool, add_tool],
    llm=llm,
    verbose=True,
    allow_parallel_tool_calls=True,
)

# Run agent
response = await agent.achat("What is (121 * 3) + (5 * 8)?")
```