End-to-end example of RAG with Chat, Embed, and Rerank

This section expands on the basic RAG usage by demonstrating a more complete example that includes:

  • Retrieval and reranking of documents (via the Embed and Rerank endpoints).
  • Building RAG for chatbots (involving multi-turn conversations).

Setup

First, import the Cohere library and create a client.

PYTHON
1# ! pip install -U cohere
2import cohere
3import json
4import numpy as np
5
6co = cohere.ClientV2(
7 "COHERE_API_KEY"
8) # Get your free API key here: https://dashboard.cohere.com/api-keys

Step 1: Generating search queries

Next, we create a search query generation tool for generating search queries from user queries.

We pass a user query, which in this example, asks about how to get to know the team.

PYTHON
1message = "How to get to know my teammates"
2
3# Define the query generation tool
4query_gen_tool = [
5 {
6 "type": "function",
7 "function": {
8 "name": "internet_search",
9 "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
10 "parameters": {
11 "type": "object",
12 "properties": {
13 "queries": {
14 "type": "array",
15 "items": {"type": "string"},
16 "description": "a list of queries to search the internet with.",
17 }
18 },
19 "required": ["queries"],
20 },
21 },
22 }
23]
24
25# Define a system message to optimize search query generation
26instructions = "Write a search query that will find helpful information for answering the user's question accurately. If you need more than one search query, write a list of search queries. If you decide that a search is very unlikely to find information that would be useful in constructing a response to the user, you should instead directly answer."
27
28# Generate search queries (if any)
29search_queries = []
30
31res = co.chat(
32 model="command-a-03-2025",
33 messages=[
34 {"role": "system", "content": instructions},
35 {"role": "user", "content": message},
36 ],
37 tools=query_gen_tool,
38)
39
40if res.message.tool_calls:
41 for tc in res.message.tool_calls:
42 queries = json.loads(tc.function.arguments)["queries"]
43 search_queries.extend(queries)
44
45print(search_queries)

Example response:

1['how to get to know your teammates']

Step 2: Fetching relevant documents

Retrieval with Embed

Given the search query, we need a way to retrieve the most relevant documents from a large collection of documents.

This is where we can leverage text embeddings through the Embed endpoint.

The Embed endpoint enables semantic search, which lets us to compare the semantic meaning of the documents and the query. It solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles at capturing the context or meaning of a piece of text.

The Embed endpoint takes in texts as input and returns embeddings as output.

First, we need to embed the documents to search from. We call the Embed endpoint using co.embed() and pass the following arguments:

  • model: Here we choose embed-english-v3.0, which generates embeddings of size 1024
  • input_type: We choose search_document to ensure the model treats these as the documents (instead of the query) for search
  • texts: The list of texts (the FAQs)
  • embedding_types: We choose the float as the embedding type.
PYTHON
1# Define the documents
2documents = [
3 {
4 "data": {
5 "text": "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged."
6 }
7 },
8 {
9 "data": {
10 "text": "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."
11 }
12 },
13 {
14 "data": {
15 "text": "Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!"
16 }
17 },
18 {
19 "data": {
20 "text": "Working Hours Flexibility: We prioritize work-life balance. While our core hours are 9 AM to 5 PM, we offer flexibility to adjust as needed."
21 }
22 },
23 {
24 "data": {
25 "text": "Side Projects Policy: We encourage you to pursue your passions. Just be mindful of any potential conflicts of interest with our business."
26 }
27 },
28 {
29 "data": {
30 "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
31 }
32 },
33 {
34 "data": {
35 "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
36 }
37 },
38 {
39 "data": {
40 "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
41 }
42 },
43 {
44 "data": {
45 "text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."
46 }
47 },
48 {
49 "data": {
50 "text": "Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead."
51 }
52 },
53]
54
55# Embed the documents
56doc_emb = co.embed(
57 model="embed-english-v3.0",
58 input_type="search_document",
59 texts=[doc["data"]["text"] for doc in documents],
60 embedding_types=["float"],
61).embeddings.float

We choose search_query as the input_type in the Embed endpoint call. This ensures the model treats this as the query (instead of the documents) for search.

PYTHON
1# Embed the search query
2query_emb = co.embed(
3 model="embed-english-v3.0",
4 input_type="search_query",
5 texts=search_queries,
6 embedding_types=["float"],
7).embeddings.float

Now, we want to search for the most relevant documents to the query. For this, we make use of the numpy library to compute the similarity between each query-document pair using the dot product approach.

Each query-document pair returns a score, which represents how similar the pair are. We then sort these scores in descending order and select the top most similar pairs, which we choose 5 (this is an arbitrary choice, you can choose any number).

Here, we show the most relevant documents with their similarity scores.

PYTHON
1# Compute dot product similarity and display results
2n = 5
3scores = np.dot(query_emb, np.transpose(doc_emb))[0]
4max_idx = np.argsort(-scores)[:n]
5
6retrieved_documents = [documents[item] for item in max_idx]
7
8for rank, idx in enumerate(max_idx):
9 print(f"Rank: {rank+1}")
10 print(f"Score: {scores[idx]}")
11 print(f"Document: {retrieved_documents[rank]}\n")
1Rank: 1
2Score: 0.32653470360872655
3Document: {'data': {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}}
4
5Rank: 2
6Score: 0.26851855352264786
7Document: {'data': {'text': 'Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead.'}}
8
9Rank: 3
10Score: 0.2581341975304149
11Document: {'data': {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}}
12
13Rank: 4
14Score: 0.18633336738178463
15Document: {'data': {'text': "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."}}
16
17Rank: 5
18Score: 0.13022396595682814
19Document: {'data': {'text': 'Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance.'}}

For simplicity, in this example, we are assuming only one query being generated. For practical implementations, multiple queries may be generated. For those scenarios, you will need to perform retrieval for each query.

Reranking with Rerank

Reranking can boost the results from semantic or lexical search further. The Rerank endpoint takes a list of search results and reranks them according to the most relevant documents to a query. This requires just a single line of code to implement.

We call the endpoint using co.rerank() and pass the following arguments:

  • query: The user query
  • documents: The list of documents we get from the semantic search results
  • top_n: The top reranked documents to select
  • model: We choose Rerank English 3

Looking at the results, we see that since the query is about getting to know the team, the document that talks about joining Slack channels is now ranked higher (1st) compared to earlier (3rd).

Here we select top_n to be 2, which will be the documents we will pass next for response generation.

PYTHON
1# Rerank the documents
2results = co.rerank(
3 model="rerank-v3.5",
4 query=search_queries[0],
5 documents=[doc["data"]["text"] for doc in retrieved_documents],
6 top_n=2,
7)
8
9# Display the reranking results
10for idx, result in enumerate(results.results):
11 print(f"Rank: {idx+1}")
12 print(f"Score: {result.relevance_score}")
13 print(f"Document: {retrieved_documents[result.index]}\n")
14
15reranked_documents = [
16 retrieved_documents[result.index] for result in results.results
17]
1Rank: 1
2Score: 0.07272241
3Document: {'data': {'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'}}
4
5Rank: 2
6Score: 0.058674112
7Document: {'data': {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}}

Step 3: Generating the response

Finally, we call the Chat endpoint by passing the retrieved documents. This tells the model to run in RAG-mode and use these documents in its response.

The response and citations are then generated based on the the query and the documents retrieved.

PYTHON
1messages = [{"role": "user", "content": message}]
2
3# Generate the response
4response = co.chat(
5 model="command-a-03-2025",
6 messages=messages,
7 documents=reranked_documents,
8)
9
10# Display the response
11print(response.message.content[0].text)
12
13# Display the citations and source documents
14if response.message.citations:
15 print("\nCITATIONS:")
16 for citation in response.message.citations:
17 print(citation, "\n")
1To get to know your teammates, you can join relevant Slack channels to stay informed and engaged. You will receive an invite via email. You can also participate in team-building activities such as monthly outings and weekly game nights.
2
3CITATIONS:
4start=39 end=67 text='join relevant Slack channels' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'})] type='TEXT_CONTENT'
5
6start=71 end=97 text='stay informed and engaged.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'})] type='TEXT_CONTENT'
7
8start=107 end=135 text='receive an invite via email.' sources=[DocumentSource(type='document', id='doc:1', document={'id': 'doc:1', 'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'})] type='TEXT_CONTENT'
9
10start=164 end=188 text='team-building activities' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'})] type='TEXT_CONTENT'
11
12start=197 end=236 text='monthly outings and weekly game nights.' sources=[DocumentSource(type='document', id='doc:0', document={'id': 'doc:0', 'text': 'Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!'})] type='TEXT_CONTENT'
Built with