For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Aya
    • Embed
    • Rerank
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Reasoning
    • Image Inputs
    • Streaming Responses
    • Predictable Outputs
    • Advanced Generation Parameters
    • Tool Use
    • Tokens and Tokenizers
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
      • Elasticsearch and Cohere
      • MongoDB and Cohere
      • Redis and Cohere
      • Haystack and Cohere
      • Pinecone and Cohere
      • Weaviate and Cohere
      • Open Search and Cohere
      • Vespa and Cohere
      • Qdrant and Cohere
      • Milvus and Cohere
      • Zilliz and Cohere
      • Chroma and Cohere
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
    • Agentic RAG
    • Cohere on Azure
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Prerequisites
  • Cohere Chat with Haystack
  • Cohere Chat with Retrieval Augmentation
  • Use Cohere Models in Haystack RAG Pipelines
  • Cohere Embeddings with Haystack
  • Index Documents with Haystack and Cohere Embeddings
  • Retrieving Documents with Haystack and Cohere Embeddings
IntegrationsIntegrating Embedding Models with Other Tools

Haystack and Cohere (Integration Guide)

Was this page helpful?
Edit this page
Previous

Pinecone and Cohere (Integration Guide)

Next
Built with

Haystack is an open source LLM framework in Python by deepset for building customizable, production-ready LLM applications. You can use Cohere’s /embed, /generate, /chat, and /rerank models with Haystack.

Cohere’s Haystack integration provides four components that can be used in various Haystack pipelines, including retrieval augmented generation, chat, indexing, and so forth:

  • The CohereDocumentEmbedder: To use Cohere embedding models to index documents into vector databases.
  • The CohereTextEmbedder : To use Cohere embedding models to do embedding retrieval.
  • The CohereGenerator : To use Cohere’s text generation models.
  • The CohereChatGenerator : To use Cohere’s chat completion endpoints.

Prerequisites

To use Cohere and Haystack you will need:

  • The cohere-haystack integration installed. To install it, run pip install cohere-haystack If you run into any issues or want more details, see these docs.
  • A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it’s in the dashboard section called “API Keys” as well.

Cohere Chat with Haystack

Haystack’s CohereChatGenerator component enables chat completion using Cohere’s large language models (LLMs). For the latest information on Cohere Chat see these docs.

In the example below, you will need to add your Cohere API key. We suggest using an environment variable, COHERE_API_KEY. Don’t commit API keys to source control!

PYTHON
1from haystack import Pipeline
2from haystack.components.builders import DynamicChatPromptBuilder
3from haystack.dataclasses import ChatMessage
4from haystack_integrations.components.generators.cohere import (
5 CohereChatGenerator,
6)
7from haystack.utils import Secret
8import os
9
10COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
11
12pipe = Pipeline()
13pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
14pipe.add_component(
15 "llm", CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
16)
17pipe.connect("prompt_builder", "llm")
18
19location = "Berlin"
20system_message = ChatMessage.from_system(
21 "You are an assistant giving out valuable information to language learners."
22)
23messages = [
24 system_message,
25 ChatMessage.from_user("Tell me about {{location}}"),
26]
27
28res = pipe.run(
29 data={
30 "prompt_builder": {
31 "template_variables": {"location": location},
32 "prompt_source": messages,
33 }
34 }
35)
36print(res)

You can pass additional dynamic variables to the LLM, like so:

PYTHON
1messages = [
2 system_message,
3 ChatMessage.from_user(
4 "What's the weather forecast for {{location}} in the next {{day_count}} days?"
5 ),
6]
7
8res = pipe.run(
9 data={
10 "prompt_builder": {
11 "template_variables": {
12 "location": location,
13 "day_count": "5",
14 },
15 "prompt_source": messages,
16 }
17 }
18)
19
20print(res)

Cohere Chat with Retrieval Augmentation

This Haystack retrieval augmented generation (RAG) pipeline passes Cohere’s documentation to a Cohere model, so it can better explain Cohere’s capabilities. In the example below, you can see the LinkContentFetcher replacing a classic retriever. The contents of the URL are passed to our generator.

PYTHON
1from haystack import Document
2from haystack import Pipeline
3from haystack.components.builders import DynamicChatPromptBuilder
4from haystack.components.generators.utils import print_streaming_chunk
5from haystack.components.fetchers import LinkContentFetcher
6from haystack.components.converters import HTMLToDocument
7from haystack.dataclasses import ChatMessage
8from haystack.utils import Secret
9
10from haystack_integrations.components.generators.cohere import (
11 CohereChatGenerator,
12)
13
14fetcher = LinkContentFetcher()
15converter = HTMLToDocument()
16prompt_builder = DynamicChatPromptBuilder(
17 runtime_variables=["documents"]
18)
19llm = CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
20
21message_template = """Answer the following question based on the contents of the article: {{query}}\n
22 Article: {{documents[0].content}} \n
23 """
24messages = [ChatMessage.from_user(message_template)]
25
26rag_pipeline = Pipeline()
27rag_pipeline.add_component(name="fetcher", instance=fetcher)
28rag_pipeline.add_component(name="converter", instance=converter)
29rag_pipeline.add_component("prompt_builder", prompt_builder)
30rag_pipeline.add_component("llm", llm)
31
32rag_pipeline.connect("fetcher.streams", "converter.sources")
33rag_pipeline.connect(
34 "converter.documents", "prompt_builder.documents"
35)
36rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
37
38question = "What are the capabilities of Cohere?"
39
40result = rag_pipeline.run(
41 {
42 "fetcher": {"urls": ["/reference/about"]},
43 "prompt_builder": {
44 "template_variables": {"query": question},
45 "prompt_source": messages,
46 },
47 "llm": {"generation_kwargs": {"max_tokens": 165}},
48 },
49)
50print(result)
51# {'llm': {'replies': [ChatMessage(content='The Cohere platform builds natural language processing and generation into your product with a few lines of code... \nIs', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'command', 'usage': {'prompt_tokens': 273, 'response_tokens': 165, 'total_tokens': 438, 'billed_tokens': 430}, 'index': 0, 'finish_reason': None, 'documents': None, 'citations': None})]}}

Use Cohere Models in Haystack RAG Pipelines

RAG provides an LLM with context allowing it to generate better answers. You can use any of Cohere’s models in a Haystack RAG pipeline with the CohereGenerator.

The code sample below adds a set of documents to an InMemoryDocumentStore, then uses those documents to answer a question. You’ll need your Cohere API key to run it.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options. You can use any of them in combination with Cohere models.

PYTHON
1from haystack import Pipeline
2from haystack.components.retrievers.in_memory import (
3 InMemoryBM25Retriever,
4)
5from haystack.components.builders.prompt_builder import PromptBuilder
6from haystack.document_stores.in_memory import InMemoryDocumentStore
7from haystack_integrations.components.generators.cohere import (
8 CohereGenerator,
9)
10from haystack import Document
11from haystack.utils import Secret
12
13import os
14
15COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
16
17docstore = InMemoryDocumentStore()
18docstore.write_documents(
19 [
20 Document(content="Rome is the capital of Italy"),
21 Document(content="Paris is the capital of France"),
22 ]
23)
24
25query = "What is the capital of France?"
26
27template = """
28Given the following information, answer the question.
29
30Context:
31{% for document in documents %}
32 {{ document.content }}
33{% endfor %}
34
35Question: {{ query }}?
36"""
37pipe = Pipeline()
38
39pipe.add_component(
40 "retriever", InMemoryBM25Retriever(document_store=docstore)
41)
42pipe.add_component("prompt_builder", PromptBuilder(template=template))
43pipe.add_component(
44 "llm", CohereGenerator(Secret.from_token(COHERE_API_KEY))
45)
46pipe.connect("retriever", "prompt_builder.documents")
47pipe.connect("prompt_builder", "llm")
48
49res = pipe.run(
50 {
51 "prompt_builder": {"query": query},
52 "retriever": {"query": query},
53 }
54)
55
56print(res)
57# {'llm': {'replies': [' Paris is the capital of France. It is known for its history, culture, and many iconic landmarks, such as the Eiffel Tower and Notre-Dame Cathedral. '], 'meta': [{'finish_reason': 'COMPLETE'}]}}

Cohere Embeddings with Haystack

You can use Cohere’s embedding models within your Haystack RAG pipelines. The list of all supported models can be found in Cohere’s model documentation. Set an environment variable for your COHERE_API_KEY before running the code samples below.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options.

Index Documents with Haystack and Cohere Embeddings

PYTHON
1from haystack import Pipeline
2from haystack import Document
3from haystack.document_stores.in_memory import InMemoryDocumentStore
4from haystack.components.writers import DocumentWriter
5from haystack_integrations.components.embedders.cohere import (
6 CohereDocumentEmbedder,
7)
8from haystack.utils import Secret
9import os
10
11COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
12token = Secret.from_token(COHERE_API_KEY)
13
14document_store = InMemoryDocumentStore(
15 embedding_similarity_function="cosine"
16)
17
18documents = [
19 Document(content="My name is Wolfgang and I live in Berlin"),
20 Document(content="I saw a black horse running"),
21 Document(content="Germany has many big cities"),
22]
23
24indexing_pipeline = Pipeline()
25indexing_pipeline.add_component(
26 "embedder", CohereDocumentEmbedder(token)
27)
28indexing_pipeline.add_component(
29 "writer", DocumentWriter(document_store=document_store)
30)
31indexing_pipeline.connect("embedder", "writer")
32
33indexing_pipeline.run({"embedder": {"documents": documents}})
34print(document_store.filter_documents())
35# [Document(id=..., content: 'My name is Wolfgang and I live in Berlin', embedding: vector of size 4096), Document(id=..., content: 'Germany has many big cities', embedding: vector of size 4096)]

Retrieving Documents with Haystack and Cohere Embeddings

After the indexing pipeline has added the embeddings to the document store, you can build a retrieval pipeline that gets the most relevant documents from your database. This can also form the basis of RAG pipelines, where a generator component can be added at the end.

PYTHON
1from haystack import Pipeline
2from haystack.components.retrievers.in_memory import (
3 InMemoryEmbeddingRetriever,
4)
5from haystack_integrations.components.embedders.cohere import (
6 CohereTextEmbedder,
7)
8
9query_pipeline = Pipeline()
10query_pipeline.add_component(
11 "text_embedder", CohereTextEmbedder(token)
12)
13query_pipeline.add_component(
14 "retriever",
15 InMemoryEmbeddingRetriever(document_store=document_store),
16)
17query_pipeline.connect(
18 "text_embedder.embedding", "retriever.query_embedding"
19)
20
21query = "Who lives in Berlin?"
22
23result = query_pipeline.run({"text_embedder": {"text": query}})
24
25print(result["retriever"]["documents"][0])
26
27# Document(id=..., text: 'My name is Wolfgang and I live in Berlin')