Haystack and Cohere (Integration Guide)

Haystack is an open source LLM framework in Python by deepset for building customizable, production-ready LLM applications. You can use Cohere’s /embed, /generate, /chat, and /rerank models with Haystack.

Cohere’s Haystack integration provides four components that can be used in various Haystack pipelines, including retrieval augmented generation, chat, indexing, and so forth:

The CohereDocumentEmbedder: To use Cohere embedding models to index documents into vector databases.
The CohereTextEmbedder : To use Cohere embedding models to do embedding retrieval.
The CohereGenerator : To use Cohere’s text generation models.
The CohereChatGenerator : To use Cohere’s chat completion endpoints.

Prerequisites

To use Cohere and Haystack you will need:

The cohere-haystack integration installed. To install it, run pip install cohere-haystack If you run into any issues or want more details, see these docs.
A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it’s in the dashboard section called “API Keys” as well.

Cohere Chat with Haystack

Haystack’s CohereChatGenerator component enables chat completion using Cohere’s large language models (LLMs). For the latest information on Cohere Chat see these docs.

In the example below, you will need to add your Cohere API key. We suggest using an environment variable, COHERE_API_KEY. Don’t commit API keys to source control!

PYTHON

1 from haystack import Pipeline
2 from haystack.components.builders import DynamicChatPromptBuilder
3 from haystack.dataclasses import ChatMessage
4 from haystack_integrations.components.generators.cohere import (
5     CohereChatGenerator,
6 )
7 from haystack.utils import Secret
8 import os
9 
10 COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
11 
12 pipe = Pipeline()
13 pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
14 pipe.add_component(
15     "llm", CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
16 )
17 pipe.connect("prompt_builder", "llm")
18 
19 location = "Berlin"
20 system_message = ChatMessage.from_system(
21     "You are an assistant giving out valuable information to language learners."
22 )
23 messages = [
24     system_message,
25     ChatMessage.from_user("Tell me about {{location}}"),
26 ]
27 
28 res = pipe.run(
29     data={
30         "prompt_builder": {
31             "template_variables": {"location": location},
32             "prompt_source": messages,
33         }
34     }
35 )
36 print(res)

You can pass additional dynamic variables to the LLM, like so:

PYTHON

1 messages = [
2     system_message,
3     ChatMessage.from_user(
4         "What's the weather forecast for {{location}} in the next {{day_count}} days?"
5     ),
6 ]
7 
8 res = pipe.run(
9     data={
10         "prompt_builder": {
11             "template_variables": {
12                 "location": location,
13                 "day_count": "5",
14             },
15             "prompt_source": messages,
16         }
17     }
18 )
19 
20 print(res)

Cohere Chat with Retrieval Augmentation

This Haystack retrieval augmented generation (RAG) pipeline passes Cohere’s documentation to a Cohere model, so it can better explain Cohere’s capabilities. In the example below, you can see the LinkContentFetcher replacing a classic retriever. The contents of the URL are passed to our generator.

PYTHON

1 from haystack import Document
2 from haystack import Pipeline
3 from haystack.components.builders import DynamicChatPromptBuilder
4 from haystack.components.generators.utils import print_streaming_chunk
5 from haystack.components.fetchers import LinkContentFetcher
6 from haystack.components.converters import HTMLToDocument
7 from haystack.dataclasses import ChatMessage
8 from haystack.utils import Secret
9 
10 from haystack_integrations.components.generators.cohere import (
11     CohereChatGenerator,
12 )
13 
14 fetcher = LinkContentFetcher()
15 converter = HTMLToDocument()
16 prompt_builder = DynamicChatPromptBuilder(
17     runtime_variables=["documents"]
18 )
19 llm = CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
20 
21 message_template = """Answer the following question based on the contents of the article: {{query}}\n
22                Article: {{documents[0].content}} \n
23            """
24 messages = [ChatMessage.from_user(message_template)]
25 
26 rag_pipeline = Pipeline()
27 rag_pipeline.add_component(name="fetcher", instance=fetcher)
28 rag_pipeline.add_component(name="converter", instance=converter)
29 rag_pipeline.add_component("prompt_builder", prompt_builder)
30 rag_pipeline.add_component("llm", llm)
31 
32 rag_pipeline.connect("fetcher.streams", "converter.sources")
33 rag_pipeline.connect(
34     "converter.documents", "prompt_builder.documents"
35 )
36 rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
37 
38 question = "What are the capabilities of Cohere?"
39 
40 result = rag_pipeline.run(
41     {
42         "fetcher": {"urls": ["/reference/about"]},
43         "prompt_builder": {
44             "template_variables": {"query": question},
45             "prompt_source": messages,
46         },
47         "llm": {"generation_kwargs": {"max_tokens": 165}},
48     },
49 )
50 print(result)
51 # {'llm': {'replies': [ChatMessage(content='The Cohere platform builds natural language processing and generation into your product with a few lines of code... \nIs', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'command', 'usage': {'prompt_tokens': 273, 'response_tokens': 165, 'total_tokens': 438, 'billed_tokens': 430}, 'index': 0, 'finish_reason': None, 'documents': None, 'citations': None})]}}

Use Cohere Models in Haystack RAG Pipelines

RAG provides an LLM with context allowing it to generate better answers. You can use any of Cohere’s models in a Haystack RAG pipeline with the CohereGenerator.

The code sample below adds a set of documents to an InMemoryDocumentStore, then uses those documents to answer a question. You’ll need your Cohere API key to run it.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options. You can use any of them in combination with Cohere models.

PYTHON

1 from haystack import Pipeline
2 from haystack.components.retrievers.in_memory import (
3     InMemoryBM25Retriever,
4 )
5 from haystack.components.builders.prompt_builder import PromptBuilder
6 from haystack.document_stores.in_memory import InMemoryDocumentStore
7 from haystack_integrations.components.generators.cohere import (
8     CohereGenerator,
9 )
10 from haystack import Document
11 from haystack.utils import Secret
12 
13 import os
14 
15 COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
16 
17 docstore = InMemoryDocumentStore()
18 docstore.write_documents(
19     [
20         Document(content="Rome is the capital of Italy"),
21         Document(content="Paris is the capital of France"),
22     ]
23 )
24 
25 query = "What is the capital of France?"
26 
27 template = """
28 Given the following information, answer the question.
29 
30 Context:
31 {% for document in documents %}
32     {{ document.content }}
33 {% endfor %}
34 
35 Question: {{ query }}?
36 """
37 pipe = Pipeline()
38 
39 pipe.add_component(
40     "retriever", InMemoryBM25Retriever(document_store=docstore)
41 )
42 pipe.add_component("prompt_builder", PromptBuilder(template=template))
43 pipe.add_component(
44     "llm", CohereGenerator(Secret.from_token(COHERE_API_KEY))
45 )
46 pipe.connect("retriever", "prompt_builder.documents")
47 pipe.connect("prompt_builder", "llm")
48 
49 res = pipe.run(
50     {
51         "prompt_builder": {"query": query},
52         "retriever": {"query": query},
53     }
54 )
55 
56 print(res)
57 # {'llm': {'replies': [' Paris is the capital of France. It is known for its history, culture, and many iconic landmarks, such as the Eiffel Tower and Notre-Dame Cathedral. '], 'meta': [{'finish_reason': 'COMPLETE'}]}}

Cohere Embeddings with Haystack

You can use Cohere’s embedding models within your Haystack RAG pipelines. The list of all supported models can be found in Cohere’s model documentation. Set an environment variable for your COHERE_API_KEY before running the code samples below.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options.

Index Documents with Haystack and Cohere Embeddings

PYTHON

1 from haystack import Pipeline
2 from haystack import Document
3 from haystack.document_stores.in_memory import InMemoryDocumentStore
4 from haystack.components.writers import DocumentWriter
5 from haystack_integrations.components.embedders.cohere import (
6     CohereDocumentEmbedder,
7 )
8 from haystack.utils import Secret
9 import os
10 
11 COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
12 token = Secret.from_token(COHERE_API_KEY)
13 
14 document_store = InMemoryDocumentStore(
15     embedding_similarity_function="cosine"
16 )
17 
18 documents = [
19     Document(content="My name is Wolfgang and I live in Berlin"),
20     Document(content="I saw a black horse running"),
21     Document(content="Germany has many big cities"),
22 ]
23 
24 indexing_pipeline = Pipeline()
25 indexing_pipeline.add_component(
26     "embedder", CohereDocumentEmbedder(token)
27 )
28 indexing_pipeline.add_component(
29     "writer", DocumentWriter(document_store=document_store)
30 )
31 indexing_pipeline.connect("embedder", "writer")
32 
33 indexing_pipeline.run({"embedder": {"documents": documents}})
34 print(document_store.filter_documents())
35 # [Document(id=..., content: 'My name is Wolfgang and I live in Berlin', embedding: vector of size 4096), Document(id=..., content: 'Germany has many big cities', embedding: vector of size 4096)]

Retrieving Documents with Haystack and Cohere Embeddings

After the indexing pipeline has added the embeddings to the document store, you can build a retrieval pipeline that gets the most relevant documents from your database. This can also form the basis of RAG pipelines, where a generator component can be added at the end.

PYTHON

1 from haystack import Pipeline
2 from haystack.components.retrievers.in_memory import (
3     InMemoryEmbeddingRetriever,
4 )
5 from haystack_integrations.components.embedders.cohere import (
6     CohereTextEmbedder,
7 )
8 
9 query_pipeline = Pipeline()
10 query_pipeline.add_component(
11     "text_embedder", CohereTextEmbedder(token)
12 )
13 query_pipeline.add_component(
14     "retriever",
15     InMemoryEmbeddingRetriever(document_store=document_store),
16 )
17 query_pipeline.connect(
18     "text_embedder.embedding", "retriever.query_embedding"
19 )
20 
21 query = "Who lives in Berlin?"
22 
23 result = query_pipeline.run({"text_embedder": {"text": query}})
24 
25 print(result["retriever"]["documents"][0])
26 
27 # Document(id=..., text: 'My name is Wolfgang and I live in Berlin')