Haystack is an open source LLM framework in Python by deepset for building customizable, production-ready LLM applications. You can use Cohere’s /embed, /generate, /chat, and /rerank models with Haystack.

Cohere’s Haystack integration provides four components that can be used in various Haystack pipelines, including retrieval augmented generation, chat, indexing, and so forth:

Prerequisites

To use Cohere and Haystack you will need:

  • The cohere-haystack integration installed. To install it, run pip install cohere-haystack If you run into any issues or want more details, see these docs.
  • A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it’s in the dashboard section called “API Keys” as well.

Cohere Chat with Haystack

Haystack’s CohereChatGenerator component enables chat completion using Cohere’s large language models (LLMs). For the latest information on Cohere Chat see these docs.

In the example below, you will need to add your Cohere API key. We suggest using an environment variable, COHERE_API_KEY. Don’t commit API keys to source control!

PYTHON
1from haystack import Pipeline
2from haystack.components.builders import DynamicChatPromptBuilder
3from haystack.dataclasses import ChatMessage
4from haystack_integrations.components.generators.cohere import CohereChatGenerator
5from haystack.utils import Secret
6import os
7
8COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
9
10pipe = Pipeline()
11pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
12pipe.add_component("llm", CohereChatGenerator(Secret.from_token(COHERE_API_KEY)))
13pipe.connect("prompt_builder", "llm")
14
15location = "Berlin"
16system_message = ChatMessage.from_system("You are an assistant giving out valuable information to language learners.")
17messages = [system_message, ChatMessage.from_user("Tell me about {{location}}")]
18
19res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location}, "prompt_source": messages}})
20print(res)

You can pass additional dynamic variables to the LLM, like so:

PYTHON
1messages = [system_message, ChatMessage.from_user("What's the weather forecast for {{location}} in the next {{day_count}} days?")]
2
3res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "day_count": "5"},
4 "prompt_source": messages}})
5
6print(res)

Cohere Chat with Retrieval Augmentation

This Haystack retrieval augmented generation (RAG) pipeline passes Cohere’s documentation to a Cohere model, so it can better explain Cohere’s capabilities. In the example below, you can see the LinkContentFetcher replacing a classic retriever. The contents of the URL are passed to our generator.

PYTHON
1from haystack import Document
2from haystack import Pipeline
3from haystack.components.builders import DynamicChatPromptBuilder
4from haystack.components.generators.utils import print_streaming_chunk
5from haystack.components.fetchers import LinkContentFetcher
6from haystack.components.converters import HTMLToDocument
7from haystack.dataclasses import ChatMessage
8from haystack.utils import Secret
9
10from haystack_integrations.components.generators.cohere import CohereChatGenerator
11
12fetcher = LinkContentFetcher()
13converter = HTMLToDocument()
14prompt_builder = DynamicChatPromptBuilder(runtime_variables=["documents"])
15llm = CohereChatGenerator(Secret.from_token(COHERE_API_KEY))
16
17message_template = """Answer the following question based on the contents of the article: {{query}}\n
18 Article: {{documents[0].content}} \n
19 """
20messages = [ChatMessage.from_user(message_template)]
21
22rag_pipeline = Pipeline()
23rag_pipeline.add_component(name="fetcher", instance=fetcher)
24rag_pipeline.add_component(name="converter", instance=converter)
25rag_pipeline.add_component("prompt_builder", prompt_builder)
26rag_pipeline.add_component("llm", llm)
27
28rag_pipeline.connect("fetcher.streams", "converter.sources")
29rag_pipeline.connect("converter.documents", "prompt_builder.documents")
30rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
31
32question = "What are the capabilities of Cohere?"
33
34result = rag_pipeline.run(
35 {
36 "fetcher": {"urls": ["/reference/about"]},
37 "prompt_builder": {"template_variables": {"query": question}, "prompt_source": messages},
38
39 "llm": {"generation_kwargs": {"max_tokens": 165}},
40 },
41)
42print(result)
43# {'llm': {'replies': [ChatMessage(content='The Cohere platform builds natural language processing and generation into your product with a few lines of code... \nIs', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'command', 'usage': {'prompt_tokens': 273, 'response_tokens': 165, 'total_tokens': 438, 'billed_tokens': 430}, 'index': 0, 'finish_reason': None, 'documents': None, 'citations': None})]}}

Use Cohere Models in Haystack RAG Pipelines

RAG provides an LLM with context allowing it to generate better answers. You can use any of Cohere’s models in a Haystack RAG pipeline with the CohereGenerator.

The code sample below adds a set of documents to an InMemoryDocumentStore, then uses those documents to answer a question. You’ll need your Cohere API key to run it.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options. You can use any of them in combination with Cohere models.

PYTHON
1from haystack import Pipeline
2from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
3from haystack.components.builders.prompt_builder import PromptBuilder
4from haystack.document_stores.in_memory import InMemoryDocumentStore
5from haystack_integrations.components.generators.cohere import CohereGenerator
6from haystack import Document
7from haystack.utils import Secret
8
9import os
10COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
11
12docstore = InMemoryDocumentStore()
13docstore.write_documents([
14 Document(content="Rome is the capital of Italy"),
15 Document(content="Paris is the capital of France")])
16
17query = "What is the capital of France?"
18
19template = """
20Given the following information, answer the question.
21
22Context:
23{% for document in documents %}
24 {{ document.content }}
25{% endfor %}
26
27Question: {{ query }}?
28"""
29pipe = Pipeline()
30
31pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
32pipe.add_component("prompt_builder", PromptBuilder(template=template))
33pipe.add_component("llm", CohereGenerator(Secret.from_token(COHERE_API_KEY)))
34pipe.connect("retriever", "prompt_builder.documents")
35pipe.connect("prompt_builder", "llm")
36
37res=pipe.run({
38 "prompt_builder": {
39 "query": query
40 },
41 "retriever": {
42 "query": query
43 }
44})
45
46print(res)
47# {'llm': {'replies': [' Paris is the capital of France. It is known for its history, culture, and many iconic landmarks, such as the Eiffel Tower and Notre-Dame Cathedral. '], 'meta': [{'finish_reason': 'COMPLETE'}]}}

Cohere Embeddings with Haystack

You can use Cohere’s embedding models within your Haystack RAG pipelines. The list of all supported models can be found in Cohere’s model documentation. Set an environment variable for your COHERE_API_KEY before running the code samples below.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options.

Index Documents with Haystack and Cohere Embeddings

PYTHON
1from haystack import Pipeline
2from haystack import Document
3from haystack.document_stores.in_memory import InMemoryDocumentStore
4from haystack.components.writers import DocumentWriter
5from haystack_integrations.components.embedders.cohere import CohereDocumentEmbedder
6from haystack.utils import Secret
7import os
8
9COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
10token = Secret.from_token(COHERE_API_KEY)
11
12document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
13
14documents = [Document(content="My name is Wolfgang and I live in Berlin"),
15 Document(content="I saw a black horse running"),
16 Document(content="Germany has many big cities")]
17
18indexing_pipeline = Pipeline()
19indexing_pipeline.add_component("embedder", CohereDocumentEmbedder(token))
20indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
21indexing_pipeline.connect("embedder", "writer")
22
23indexing_pipeline.run({"embedder": {"documents": documents}})
24print(document_store.filter_documents())
25# [Document(id=..., content: 'My name is Wolfgang and I live in Berlin', embedding: vector of size 4096), Document(id=..., content: 'Germany has many big cities', embedding: vector of size 4096)]

Retrieving Documents with Haystack and Cohere Embeddings

After the indexing pipeline has added the embeddings to the document store, you can build a retrieval pipeline that gets the most relevant documents from your database. This can also form the basis of RAG pipelines, where a generator component can be added at the end.

PYTHON
1from haystack import Pipeline
2from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
3from haystack_integrations.components.embedders.cohere import CohereTextEmbedder
4
5query_pipeline = Pipeline()
6query_pipeline.add_component("text_embedder", CohereTextEmbedder(token))
7query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
8query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
9
10query = "Who lives in Berlin?"
11
12result = query_pipeline.run({"text_embedder":{"text": query}})
13
14print(result['retriever']['documents'][0])
15
16# Document(id=..., text: 'My name is Wolfgang and I live in Berlin')