> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.cohere.com/llms.txt.
> For full documentation content, see https://docs.cohere.com/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.cohere.com/_mcp/server.

# Haystack and Cohere (Integration Guide)

> Build custom LLM applications with Haystack, now integrated with Cohere for embedding, generation, chat, and retrieval.

<img src="https://files.buildwithfern.com/cohere.docs.buildwithfern.com/71d786e4b0826157c485452240cfb03fe6d9d47c896ffca557f4fde8acd73b21/assets/images/d82df2c-haystack-logo.png" width="200px" height="auto" class="light-bg" />

[Haystack](https://github.com/deepset-ai/haystack) is an open source LLM framework in Python by [deepset](https://www.deepset.ai/) for building customizable, production-ready LLM applications. You can use Cohere's `/embed`, `/generate`, `/chat`, and `/rerank` models with Haystack.

Cohere's Haystack integration provides four components that can be used in various Haystack pipelines, including retrieval augmented generation, chat, indexing, and so forth:

* The `CohereDocumentEmbedder`: To use Cohere embedding models to [index documents](https://docs.haystack.deepset.ai/v2.0/docs/coheredocumentembedder) into vector databases.
* The `CohereTextEmbedder` : To use Cohere embedding models to do [embedding retrieval](https://docs.haystack.deepset.ai/v2.0/docs/coheretextembedder).
* The `CohereGenerator` : To use Cohere’s [text generation models](https://docs.haystack.deepset.ai/v2.0/docs/coheregenerator).
* The `CohereChatGenerator` : To use Cohere’s [chat completion](https://docs.haystack.deepset.ai/v2.0/docs/coherechatgenerator) endpoints.

### Prerequisites

To use Cohere and Haystack you will need:

* The `cohere-haystack` integration installed. To install it, run `pip install cohere-haystack` If you run into any issues or want more details, [see these docs.](https://haystack.deepset.ai/integrations/cohere)
* A Cohere API Key. For more details on pricing [see this page](https://cohere.com/pricing). When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it's in the dashboard section called "API Keys" as well.

### Cohere Chat with Haystack

Haystack’s `CohereChatGenerator` component enables chat completion using Cohere's large language models (LLMs). For the latest information on Cohere Chat [see these docs](/docs/chat-api).

In the example below, you will need to add your Cohere API key. We suggest using an environment variable, `COHERE_API_KEY`. Don’t commit API keys to source control!

```python PYTHON
from haystack import Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.cohere import (
    CohereChatGenerator,
)
from haystack.utils import Secret
import os

COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

pipe = Pipeline()
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component(
    "llm", CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
)
pipe.connect("prompt_builder", "llm")

location = "Berlin"
system_message = ChatMessage.from_system(
    "You are an assistant giving out valuable information to language learners."
)
messages = [
    system_message,
    ChatMessage.from_user("Tell me about {{location}}"),
]

res = pipe.run(
    data={
        "prompt_builder": {
            "template_variables": {"location": location},
            "prompt_source": messages,
        }
    }
)
print(res)
```

You can pass additional dynamic variables to the LLM, like so:

```python PYTHON
messages = [
    system_message,
    ChatMessage.from_user(
        "What's the weather forecast for {{location}} in the next {{day_count}} days?"
    ),
]

res = pipe.run(
    data={
        "prompt_builder": {
            "template_variables": {
                "location": location,
                "day_count": "5",
            },
            "prompt_source": messages,
        }
    }
)

print(res)
```

### Cohere Chat with Retrieval Augmentation

This Haystack [retrieval augmented generation](/docs/retrieval-augmented-generation-rag) (RAG) pipeline passes Cohere’s documentation to a Cohere model, so it can better explain Cohere’s capabilities. In the example below, you can see the `LinkContentFetcher` replacing a classic retriever. The contents of the URL are passed to our generator.

```python PYTHON
from haystack import Document
from haystack import Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

from haystack_integrations.components.generators.cohere import (
    CohereChatGenerator,
)

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
prompt_builder = DynamicChatPromptBuilder(
    runtime_variables=["documents"]
)
llm = CohereChatGenerator(Secret.from_token(COHERE_API_KEY))

message_template = """Answer the following question based on the contents of the article: {{query}}\n
               Article: {{documents[0].content}} \n
           """
messages = [ChatMessage.from_user(message_template)]

rag_pipeline = Pipeline()
rag_pipeline.add_component(name="fetcher", instance=fetcher)
rag_pipeline.add_component(name="converter", instance=converter)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)

rag_pipeline.connect("fetcher.streams", "converter.sources")
rag_pipeline.connect(
    "converter.documents", "prompt_builder.documents"
)
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

question = "What are the capabilities of Cohere?"

result = rag_pipeline.run(
    {
        "fetcher": {"urls": ["/reference/about"]},
        "prompt_builder": {
            "template_variables": {"query": question},
            "prompt_source": messages,
        },
        "llm": {"generation_kwargs": {"max_tokens": 165}},
    },
)
print(result)
# {'llm': {'replies': [ChatMessage(content='The Cohere platform builds natural language processing and generation into your product with a few lines of code... \nIs', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'command', 'usage': {'prompt_tokens': 273, 'response_tokens': 165, 'total_tokens': 438, 'billed_tokens': 430}, 'index': 0, 'finish_reason': None, 'documents': None, 'citations': None})]}}
```

### Use Cohere Models in Haystack RAG Pipelines

RAG provides an LLM with context allowing it to generate better answers. You can use any of [Cohere’s models](/docs/models) in a [Haystack RAG pipeline](https://docs.haystack.deepset.ai/v2.0/docs/creating-pipelines) with the `CohereGenerator`.

The code sample below adds a set of documents to an `InMemoryDocumentStore`, then uses those documents to answer a question. You’ll need your Cohere API key to run it.

Although these examples use an `InMemoryDocumentStore` to keep things simple, Haystack supports [a variety](https://haystack.deepset.ai/integrations?type=Document+Store) of vector database and document store options. You can use any of them in combination with Cohere models.

```python PYTHON
from haystack import Pipeline
from haystack.components.retrievers.in_memory import (
    InMemoryBM25Retriever,
)
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.cohere import (
    CohereGenerator,
)
from haystack import Document
from haystack.utils import Secret

import os

COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

docstore = InMemoryDocumentStore()
docstore.write_documents(
    [
        Document(content="Rome is the capital of Italy"),
        Document(content="Paris is the capital of France"),
    ]
)

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component(
    "retriever", InMemoryBM25Retriever(document_store=docstore)
)
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component(
    "llm", CohereGenerator(Secret.from_token(COHERE_API_KEY))
)
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res = pipe.run(
    {
        "prompt_builder": {"query": query},
        "retriever": {"query": query},
    }
)

print(res)
# {'llm': {'replies': [' Paris is the capital of France. It is known for its history, culture, and many iconic landmarks, such as the Eiffel Tower and Notre-Dame Cathedral. '], 'meta': [{'finish_reason': 'COMPLETE'}]}}
```

### Cohere Embeddings with Haystack

You can use Cohere’s embedding models within your Haystack RAG pipelines. The list of all supported models can be found in Cohere’s [model documentation](/docs/models#representation). Set an environment variable for your `COHERE_API_KEY` before running the code samples below.

Although these examples use an `InMemoryDocumentStore` to keep things simple, Haystack supports [a variety](https://haystack.deepset.ai/integrations?type=Document+Store) of vector database and document store options.

#### Index Documents with Haystack and Cohere Embeddings

```python PYTHON
from haystack import Pipeline
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.cohere import (
    CohereDocumentEmbedder,
)
from haystack.utils import Secret
import os

COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
token = Secret.from_token(COHERE_API_KEY)

document_store = InMemoryDocumentStore(
    embedding_similarity_function="cosine"
)

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component(
    "embedder", CohereDocumentEmbedder(token)
)
indexing_pipeline.add_component(
    "writer", DocumentWriter(document_store=document_store)
)
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"embedder": {"documents": documents}})
print(document_store.filter_documents())
# [Document(id=..., content: 'My name is Wolfgang and I live in Berlin', embedding: vector of size 4096), Document(id=..., content: 'Germany has many big cities', embedding: vector of size 4096)]
```

#### Retrieving Documents with Haystack and Cohere Embeddings

After the indexing pipeline has added the embeddings to the document store, you can build a retrieval pipeline that gets the most relevant documents from your database. This can also form the basis of RAG pipelines, where a generator component can be added at the end.

```python PYTHON
from haystack import Pipeline
from haystack.components.retrievers.in_memory import (
    InMemoryEmbeddingRetriever,
)
from haystack_integrations.components.embedders.cohere import (
    CohereTextEmbedder,
)

query_pipeline = Pipeline()
query_pipeline.add_component(
    "text_embedder", CohereTextEmbedder(token)
)
query_pipeline.add_component(
    "retriever",
    InMemoryEmbeddingRetriever(document_store=document_store),
)
query_pipeline.connect(
    "text_embedder.embedding", "retriever.query_embedding"
)

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result["retriever"]["documents"][0])

# Document(id=..., text: 'My name is Wolfgang and I live in Berlin')
```