Haystack and Cohere

Haystack is an open source LLM framework in Python by deepset for building customizable, production-ready LLM applications. You can use Cohere's /embed, /generate, /chat, and /rerank models with Haystack.

Cohere's Haystack integration provides four components that can be used in various Haystack pipelines, including retrieval augmented generation, chat, indexing, and so forth:

Prerequisites

To use Cohere and Haystack you will need:

  • The cohere-haystack integration installed. To install it, run pip install cohere-haystack If you run into any issues or want more details, see these docs.
  • A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it's in the dashboard section called "API Keys" as well.

Cohere Chat with Haystack

Haystack’s CohereChatGenerator component enables chat completion using Cohere's large language models (LLMs). For the latest information on Cohere Chat see these docs.

In the example below, you will need to add your Cohere API key. We suggest using an environment variable, COHERE_API_KEY. Don’t commit API keys to source control!

from haystack import Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.generators.cohere import CohereChatGenerator
from haystack.utils import Secret
import os

COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

pipe = Pipeline()
pipe.add_component("prompt_builder", DynamicChatPromptBuilder())
pipe.add_component("llm", CohereChatGenerator(Secret.from_token(COHERE_API_KEY)))
pipe.connect("prompt_builder", "llm")

location = "Berlin"
system_message = ChatMessage.from_system("You are an assistant giving out valuable information to language learners.")
messages = [system_message, ChatMessage.from_user("Tell me about {{location}}")]

res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location}, "prompt_source": messages}})
print(res)

You can pass additional dynamic variables to the LLM, like so:

messages = [system_message, ChatMessage.from_user("What's the weather forecast for {{location}} in the next {{day_count}} days?")]

res = pipe.run(data={"prompt_builder": {"template_variables": {"location": location, "day_count": "5"},
                                    "prompt_source": messages}})

print(res)

Cohere Chat with Retrieval Augmentation

This Haystack retrieval augmented generation (RAG) pipeline passes Cohere’s documentation to a Cohere model, so it can better explain Cohere’s capabilities. In the example below, you can see the LinkContentFetcher replacing a classic retriever. The contents of the URL are passed to our generator.

from haystack import Document
from haystack import Pipeline
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.generators.utils import print_streaming_chunk
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

from haystack_integrations.components.generators.cohere import CohereChatGenerator

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
prompt_builder = DynamicChatPromptBuilder(runtime_variables=["documents"])
llm = CohereChatGenerator(Secret.from_token(COHERE_API_KEY))

message_template = """Answer the following question based on the contents of the article: {{query}}\n
               Article: {{documents[0].content}} \n 
           """
messages = [ChatMessage.from_user(message_template)]

rag_pipeline = Pipeline()
rag_pipeline.add_component(name="fetcher", instance=fetcher)
rag_pipeline.add_component(name="converter", instance=converter)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)

rag_pipeline.connect("fetcher.streams", "converter.sources")
rag_pipeline.connect("converter.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

question = "What are the capabilities of Cohere?"

result = rag_pipeline.run(
    {
        "fetcher": {"urls": ["https://docs.cohere.com/reference/about"]},
        "prompt_builder": {"template_variables": {"query": question}, "prompt_source": messages},
      
        "llm": {"generation_kwargs": {"max_tokens": 165}},
    },
)
print(result)
# {'llm': {'replies': [ChatMessage(content='The Cohere platform builds natural language processing and generation into your product with a few lines of code... \nIs', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'command', 'usage': {'prompt_tokens': 273, 'response_tokens': 165, 'total_tokens': 438, 'billed_tokens': 430}, 'index': 0, 'finish_reason': None, 'documents': None, 'citations': None})]}}

Use Cohere Models in Haystack RAG Pipelines

RAG provides an LLM with context allowing it to generate better answers. You can use any of Cohere’s models in a Haystack RAG pipeline with the CohereGenerator.

The code sample below adds a set of documents to an InMemoryDocumentStore, then uses those documents to answer a question. You’ll need your Cohere API key to run it.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options. You can use any of them in combination with Cohere models.

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.generators.cohere import CohereGenerator
from haystack import Document
from haystack.utils import Secret

import os
COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

docstore = InMemoryDocumentStore()
docstore.write_documents([
	Document(content="Rome is the capital of Italy"),
	Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", CohereGenerator(Secret.from_token(COHERE_API_KEY)))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

res=pipe.run({
    "prompt_builder": {
        "query": query
    },
    "retriever": {
        "query": query
    }
})

print(res)
# {'llm': {'replies': [' Paris is the capital of France. It is known for its history, culture, and many iconic landmarks, such as the Eiffel Tower and Notre-Dame Cathedral. '], 'meta': [{'finish_reason': 'COMPLETE'}]}}

Cohere Embeddings with Haystack

You can use Cohere’s embedding models within your Haystack RAG pipelines. The list of all supported models can be found in Cohere’s model documentation. Set an environment variable for your COHERE_API_KEY before running the code samples below.

Although these examples use an InMemoryDocumentStore to keep things simple, Haystack supports a variety of vector database and document store options.

Index Documents with Haystack and Cohere Embeddings

from haystack import Pipeline
from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.cohere import CohereDocumentEmbedder
from haystack.utils import Secret
import os

COHERE_API_KEY = os.environ.get("COHERE_API_KEY")
token = Secret.from_token(COHERE_API_KEY)

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", CohereDocumentEmbedder(token))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"embedder": {"documents": documents}})
print(document_store.filter_documents())
# [Document(id=..., content: 'My name is Wolfgang and I live in Berlin', embedding: vector of size 4096), Document(id=..., content: 'Germany has many big cities', embedding: vector of size 4096)]

Retrieving Documents with Haystack and Cohere Embeddings

After the indexing pipeline has added the embeddings to the document store, you can build a retrieval pipeline that gets the most relevant documents from your database. This can also form the basis of RAG pipelines, where a generator component can be added at the end.

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.embedders.cohere import CohereTextEmbedder

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", CohereTextEmbedder(token))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who lives in Berlin?"

result = query_pipeline.run({"text_embedder":{"text": query}})

print(result['retriever']['documents'][0])

# Document(id=..., text: 'My name is Wolfgang and I live in Berlin')