Cohere Chat on LangChain (Integration Guide)

Cohere supports various integrations with LangChain, a large language model (LLM) framework which allows you to quickly create applications based on Cohere’s models. This doc will guide you through how to leverage Cohere Chat with LangChain.

Prerequisites

Running Cohere Chat with LangChain doesn’t require many prerequisites, consult the top-level document for more information.

Cohere Chat with LangChain

To use Cohere chat with LangChain, simply create a ChatCohere object and pass in the message or message history. In the example below, you will need to add your Cohere API key.

PYTHON

1 from langchain_cohere import ChatCohere
2 from langchain_core.messages import AIMessage, HumanMessage
3 
4 # Define the Cohere LLM
5 llm = ChatCohere(
6     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
7 )
8 
9 # Send a chat message without chat history
10 current_message = [HumanMessage(content="knock knock")]
11 print(llm.invoke(current_message))
12 
13 # Send a chat message with chat history, note the last message is the current user message
14 current_message_and_history = [
15     HumanMessage(content="knock knock"),
16     AIMessage(content="Who's there?"),
17     HumanMessage(content="Tank"),
18 ]
19 print(llm.invoke(current_message_and_history))

Cohere Agents with LangChain

LangChain Agents use a language model to choose a sequence of actions to take.

To use Cohere’s multi hop agent create a create_cohere_react_agent and pass in the LangChain tools you would like to use.

For example, using an internet search tool to get essay writing advice from Cohere with citations:

PYTHON

1 from langchain_cohere import ChatCohere
2 from langchain_cohere.react_multi_hop.agent import (
3     create_cohere_react_agent,
4 )
5 from langchain.agents import AgentExecutor
6 from langchain_community.tools.tavily_search import (
7     TavilySearchResults,
8 )
9 from langchain_core.prompts import ChatPromptTemplate
10 
11 # Internet search tool - you can use any tool, and there are lots of community tools in LangChain.
12 # To use the Tavily tool you will need to set an API key in the TAVILY_API_KEY environment variable.
13 os.environ["TAVILY_API_KEY"] = "TAVILY_API_KEY"
14 internet_search = TavilySearchResults()
15 
16 # Define the Cohere LLM
17 llm = ChatCohere(
18     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
19 )
20 
21 # Create an agent
22 agent = create_cohere_react_agent(
23     llm=llm,
24     tools=[internet_search],
25     prompt=ChatPromptTemplate.from_template("{question}"),
26 )
27 
28 # Create an agent executor
29 agent_executor = AgentExecutor(
30     agent=agent, tools=[internet_search], verbose=True
31 )
32 
33 # Generate a response
34 response = agent_executor.invoke(
35     {
36         "question": "I want to write an essay. Any tips?",
37     }
38 )
39 
40 # See Cohere's response
41 print(response.get("output"))
42 # Cohere provides exact citations for the sources it used
43 print(response.get("citations"))

Cohere Chat and RAG with LangChain

To use Cohere’s retrieval augmented generation (RAG) functionality with LangChain, create a CohereRagRetriever object. Then there are a few RAG uses, discussed in the next few sections.

Using LangChain’s Retrievers

In this example, we use the wikipedia retriever but any retriever supported by LangChain can be used here. In order to set up the wikipedia retriever you need to install the wikipedia python package using %pip install --upgrade --quiet wikipedia. With that done, you can execute this code to see how a retriever works:

PYTHON

1 from langchain_cohere import CohereRagRetriever
2 from langchain.retrievers import WikipediaRetriever
3 from langchain_cohere import ChatCohere
4 
5 # User query we will use for the generation
6 user_query = "What is cohere?"
7 # Define the Cohere LLM
8 llm = ChatCohere(
9     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
10 )
11 # Create the Cohere rag retriever using the chat model
12 rag = CohereRagRetriever(llm=llm, connectors=[])
13 # Create the wikipedia retriever
14 wiki_retriever = WikipediaRetriever()
15 # Get the relevant documents from wikipedia
16 wiki_docs = wiki_retriever.invoke(user_query)
17 # Get the cohere generation from the cohere rag retriever
18 docs = rag.invoke(user_query, documents=wiki_docs)
19 # Print the documents
20 print("Documents:")
21 for doc in docs[:-1]:
22     print(doc.metadata)
23     print("\n\n" + doc.page_content)
24     print("\n\n" + "-" * 30 + "\n\n")
25 # Print the final generation
26 answer = docs[-1].page_content
27 print("Answer:")
28 print(answer)
29 # Print the final citations
30 citations = docs[-1].metadata["citations"]
31 print("Citations:")
32 print(docs[-1].__dict__)

Using Documents

In this example, we take documents (which might be generated in other parts of your application) and pass them into the CohereRagRetriever object:

PYTHON

1 from langchain_cohere import CohereRagRetriever
2 from langchain_cohere import ChatCohere
3 from langchain_core.documents import Document
4 
5 # Define the Cohere LLM
6 llm = ChatCohere(
7     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
8 )
9 
10 # Create the Cohere rag retriever using the chat model
11 rag = CohereRagRetriever(llm=llm, connectors=[])
12 docs = rag.invoke(
13     "Does LangChain support cohere RAG?",
14     documents=[
15         Document(
16             page_content="LangChain supports cohere RAG!",
17             metadata={"id": "id-1"},
18         ),
19         Document(
20             page_content="The sky is blue!", metadata={"id": "id-2"}
21         ),
22     ],
23 )
24 
25 # Print the documents
26 print("Documents:")
27 for doc in docs[:-1]:
28     print(doc.metadata)
29     print("\n\n" + doc.page_content)
30     print("\n\n" + "-" * 30 + "\n\n")
31 # Print the final generation
32 answer = docs[-1].page_content
33 print("Answer:")
34 print(answer)
35 # Print the final citations
36 citations = docs[-1].metadata["citations"]
37 print("Citations:")
38 print(citations)

Using a Connector

In this example, we create a generation with a connector which allows us to get a generation with citations to results from the connector. We use the “web-search” connector, which is available to everyone. But if you have created your own connector in your org you can pass in its id, like so: rag = CohereRagRetriever(llm=cohere_chat_model, connectors=[{"id": "example-connector-id"}])

Here’s a code sample illustrating how to use a connector:

PYTHON

1 from langchain_cohere import CohereRagRetriever
2 from langchain_cohere import ChatCohere
3 from langchain_core.documents import Document
4 
5 # Define the Cohere LLM
6 llm = ChatCohere(
7     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
8 )
9 
10 # Create the Cohere rag retriever using the chat model with the web search connector
11 rag = CohereRagRetriever(llm=llm, connectors=[{"id": "web-search"}])
12 docs = rag.invoke("Who founded Cohere?")
13 # Print the documents
14 print("Documents:")
15 for doc in docs[:-1]:
16     print(doc.metadata)
17     print("\n\n" + doc.page_content)
18     print("\n\n" + "-" * 30 + "\n\n")
19 # Print the final generation
20 answer = docs[-1].page_content
21 print("Answer:")
22 print(answer)
23 # Print the final citations
24 citations = docs[-1].metadata["citations"]
25 print("Citations:")
26 print(citations)

Using the `create_stuff_documents_chain` Chain

This chain takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM. It passes ALL documents, so you should make sure it fits within the context window of the LLM you are using.

Note: this feature is currently in beta.

PYTHON

1 from langchain_cohere import ChatCohere
2 from langchain_core.documents import Document
3 from langchain_core.prompts import ChatPromptTemplate
4 from langchain.chains.combine_documents import (
5     create_stuff_documents_chain,
6 )
7 
8 prompt = ChatPromptTemplate.from_messages(
9     [("human", "What are everyone's favorite colors:\n\n{context}")]
10 )
11 
12 # Define the Cohere LLM
13 llm = ChatCohere(
14     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
15 )
16 
17 chain = create_stuff_documents_chain(llm, prompt)
18 
19 docs = [
20     Document(page_content="Jesse loves red but not yellow"),
21     Document(
22         page_content="Jamal loves green but not as much as he loves orange"
23     ),
24 ]
25 
26 chain.invoke({"context": docs})

Structured Output Generation

Cohere supports generating JSON objects to structure and organize the model’s responses in a way that can be used in downstream applications.

You can specify the response_format parameter to indicate that you want the response in a JSON object format.

PYTHON

1 from langchain_cohere import ChatCohere
2 
3 # Define the Cohere LLM
4 llm = ChatCohere(
5     cohere_api_key="COHERE_API_KEY", model="command-a-03-2025"
6 )
7 
8 res = llm.invoke(
9     "John is five years old",
10     response_format={
11         "type": "json_object",
12         "schema": {
13             "title": "Person",
14             "description": "Identifies the age and name of a person",
15             "type": "object",
16             "properties": {
17                 "name": {
18                     "type": "string",
19                     "description": "Name of the person",
20                 },
21                 "age": {
22                     "type": "number",
23                     "description": "Age of the person",
24                 },
25             },
26             "required": [
27                 "name",
28                 "age",
29             ],
30         },
31     },
32 )
33 
34 print(res)

Text Summarization

You can use the load_summarize_chain chain to perform text summarization.

PYTHON

1 from langchain_cohere import ChatCohere
2 from langchain.chains.summarize import load_summarize_chain
3 from langchain_community.document_loaders import WebBaseLoader
4 
5 loader = WebBaseLoader("https://docs.cohere.com/docs/cohere-toolkit")
6 docs = loader.load()
7 
8 # Define the Cohere LLM
9 llm = ChatCohere(
10     cohere_api_key="COHERE_API_KEY",
11     model="command-a-03-2025",
12     temperature=0,
13 )
14 
15 chain = load_summarize_chain(llm, chain_type="stuff")
16 
17 chain.invoke({"input_documents": docs})

Using LangChain on Private Deployments

You can use LangChain with privately deployed Cohere models. To use it, specify your model deployment URL in the base_url parameter.

PYTHON

1 llm = ChatCohere(
2     base_url="<YOUR_DEPLOYMENT_URL>",
3     cohere_api_key="COHERE_API_KEY",
4     model="MODEL_NAME",
5 )

Prerequisites

Cohere Chat with LangChain

Cohere Agents with LangChain

Cohere Chat and RAG with LangChain

Using LangChain’s Retrievers

Using Documents

Using a Connector

Using the create_stuff_documents_chain Chain

Structured Output Generation

Text Summarization

Using LangChain on Private Deployments

Using the `create_stuff_documents_chain` Chain