Build Chatbots with MongoDB and Cohere

What you will learn:

How to empower leverage semantic search on customer or operational data in MongoDB Atlas.
Pass retrieved data to Cohere’s Command R+ generative model for retrieval-augmented generation (RAG).
Develop and deploy a RAG-optimized user interface for your app.
Create a conversation data store for your RAG chatbot using MongoDB

Use Case: Develop an advanced chatbot assistant that provides asset managers with information and actionable insights on technology company market reports.

Introduction

What is Cohere?
What is MongoDB?
How Cohere and MongoDB work together?

What is Cohere?

What is MongoDB?

What exactly are we showing today?

Step 1: Install libaries and Set Environment Variables

Critical Security Reminder: Safeguard your production environment by never committing sensitive information, such as environment variable values, to public repositories. This practice is essential for maintaining the security and integrity of your systems.

Libraries:

cohere: A Python library for accessing Cohere’s large language models, enabling natural language processing tasks like text generation, classification, and embedding.
pymongo: The recommended Python driver for MongoDB, allowing Python applications to interact with MongoDB databases for data storage and retrieval.
datasets: A library by Hugging Face that provides easy access to a wide range of datasets for machine learning and natural language processing tasks. *tqdm: A fast, extensible progress bar library for Python, useful for displaying progress in long-running operations or loops.

1 pip install --quiet datasets tqdm cohere pymongo

1 import os
2 import cohere
3 
4 os.environ["COHERE_API_KEY"] = ""
5 co = cohere.Client(os.environ.get("COHERE_API_KEY"))
6 
7 os.environ["HF_TOKEN"] = ""

Step 2: Data Loading and Preparation

Dataset Information

This dataset contains detailed information about multiple technology companies in the Information Technology sector. For each company, the dataset includes:

Company name and stock ticker symbol
Market analysis reports for recent years (typically 2023 and 2024), which include:

Title and author of the report
Date of publication
Detailed content covering financial performance, product innovations, market position, challenges, and future outlook
Stock recommendations and price targets

Key financial metrics such as:

Current stock price
52-week price range
Market capitalization
Price-to-earnings (P/E) ratio
Dividend yield

Recent news items, typically including:

Date of the news
Headline
Brief summary

The market analysis reports provide in-depth information about each company’s performance, innovations, challenges, and future prospects. They offer insights into the companies’ strategies, market positions, and potential for growth.

1 import pandas as pd
2 from datasets import load_dataset
3 
4 # Make sure you have an Hugging Face token(HF_TOKEN) in your development environemnt before running the code below
5 # How to get a token: https://huggingface.co/docs/hub/en/security-tokens
6 # https://huggingface.co/datasets/MongoDB/fake_tech_companies_market_reports
7 dataset = load_dataset(
8     "MongoDB/fake_tech_companies_market_reports",
9     split="train",
10     streaming=True,
11 )
12 dataset_df = dataset.take(100)
13 
14 # Convert the dataset to a pandas dataframe
15 dataset_df = pd.DataFrame(dataset_df)
16 dataset_df.head(5)

	recent_news	reports	company	ticker	key_metrics	sector
0	[{‘date’: ‘2024-06-09’, ‘headline’: ‘CyberDefe…	[{‘author’: ‘Taylor Smith, Technology Sector L…	CyberDefense Dynamics	CDDY	{‘52_week_range’: {‘high’: 387.3, ‘low’: 41.63…	Information Technology
1	[{‘date’: ‘2024-07-04’, ‘headline’: ‘CloudComp…	[{‘author’: ‘Casey Jones, Chief Market Strateg…	CloudCompute Pro	CCPR	{‘52_week_range’: {‘high’: 524.23, ‘low’: 171…	Information Technology
2	[{‘date’: ‘2024-06-27’, ‘headline’: ‘VirtualRe…	[{‘author’: ‘Sam Brown, Head of Equity Researc…	VirtualReality Systems	VRSY	{‘52_week_range’: {‘high’: 530.59, ‘low’: 56.4…	Information Technology
3	[{‘date’: ‘2024-07-06’, ‘headline’: ‘BioTech I…	[{‘author’: ‘Riley Smith, Senior Tech Analyst…	BioTech Innovations	BTCI	{‘52_week_range’: {‘high’: 366.55, ‘low’: 124…	Information Technology
4	[{‘date’: ‘2024-06-26’, ‘headline’: ‘QuantumCo…	[{‘author’: ‘Riley Garcia, Senior Tech Analyst…	QuantumComputing Inc	QCMP	{‘52_week_range’: {‘high’: 231.91, ‘low’: 159…	Information Technology

1 # Data Preparation
2 def combine_attributes(row):
3     combined = f"{row['company']} {row['sector']} "
4 
5     # Add reports information
6     for report in row["reports"]:
7         combined += f"{report['year']} {report['title']} {report['author']} {report['content']} "
8 
9     # Add recent news information
10     for news in row["recent_news"]:
11         combined += f"{news['headline']} {news['summary']} "
12 
13     return combined.strip()

1 # Add the new column 'combined_attributes'
2 dataset_df["combined_attributes"] = dataset_df.apply(
3     combine_attributes, axis=1
4 )

1 # Display the first few rows of the updated dataframe
2 dataset_df[["company", "ticker", "combined_attributes"]].head()

	company	ticker	combined_attributes
0	CyberDefense Dynamics	CDDY	CyberDefense Dynamics Information Technology 2…
1	CloudCompute Pro	CCPR	CloudCompute Pro Information Technology 2023 C…
2	VirtualReality Systems	VRSY	VirtualReality Systems Information Technology …
3	BioTech Innovations	BTCI	BioTech Innovations Information Technology 202…
4	QuantumComputing Inc	QCMP	QuantumComputing Inc Information Technology 20…

Step 3: Embedding Generation with Cohere

1 from tqdm import tqdm
2 
3 
4 def get_embedding(
5     text: str, input_type: str = "search_document"
6 ) -> list[float]:
7     if not text.strip():
8         print("Attempted to get embedding for empty text.")
9         return []
10 
11     model = "embed-v4.0"
12     response = co.embed(
13         texts=[text],
14         model=model,
15         input_type=input_type,  # Used for embeddings of search queries run against a vector DB to find relevant documents
16         embedding_types=["float"],
17     )
18 
19     return response.embeddings.float[0]
20 
21 
22 # Apply the embedding function with a progress bar
23 tqdm.pandas(desc="Generating embeddings")
24 dataset_df["embedding"] = dataset_df[
25     "combined_attributes"
26 ].progress_apply(get_embedding)
27 
28 print(f"We just computed {len(dataset_df['embedding'])} embeddings.")

We just computed 63 embeddings.

1 dataset_df.head()

	recent_news	reports	company	ticker	key_metrics	sector	combined_attributes	embedding
0	[{‘date’: ‘2024-06-09’, ‘headline’: ‘CyberDefe…	[{‘author’: ‘Taylor Smith, Technology Sector L…	CyberDefense Dynamics	CDDY	{‘52_week_range’: {‘high’: 387.3, ‘low’: 41.63…	Information Technology	CyberDefense Dynamics Information Technology 2…	[0.01210022, -0.03466797, -0.017562866, -0.025…
1	[{‘date’: ‘2024-07-04’, ‘headline’: ‘CloudComp…	[{‘author’: ‘Casey Jones, Chief Market Strateg…	CloudCompute Pro	CCPR	{‘52_week_range’: {‘high’: 524.23, ‘low’: 171…	Information Technology	CloudCompute Pro Information Technology 2023 C…	[-0.058563232, -0.06323242, -0.037139893, -0.0…
2	[{‘date’: ‘2024-06-27’, ‘headline’: ‘VirtualRe…	[{‘author’: ‘Sam Brown, Head of Equity Researc…	VirtualReality Systems	VRSY	{‘52_week_range’: {‘high’: 530.59, ‘low’: 56.4…	Information Technology	VirtualReality Systems Information Technology …	[0.024154663, -0.022872925, -0.01751709, -0.05…
3	[{‘date’: ‘2024-07-06’, ‘headline’: ‘BioTech I…	[{‘author’: ‘Riley Smith, Senior Tech Analyst’…	BioTech Innovations	BTCI	{‘52_week_range’: {‘high’: 366.55, ‘low’: 124…	Information Technology	BioTech Innovations Information Technology 202…	[0.020736694, -0.041046143, -0.0029773712, -0…
4	[{‘date’: ‘2024-06-26’, ‘headline’: ‘QuantumCo…	[{‘author’: ‘Riley Garcia, Senior Tech Analyst…	QuantumComputing Inc	QCMP	{‘52_week_range’: {‘high’: 231.91, ‘low’: 159…	Information Technology	QuantumComputing Inc Information Technology 20…	[-0.009757996, -0.04815674, 0.039611816, 0.023…

Step 4: MongoDB Vector Database and Connection Setup

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
Create the database: asset_management_use_case.
Within the database asset_management_use_case, create the collection market_reports.
Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.

Your vector search index created on MongoDB Atlas should look like below:

{
  "fields": [
    {
      "numDimensions": 1024,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}

Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

1 import os
2 
3 os.environ["MONGO_URI"] = ""

1 import pymongo
2 
3 
4 def get_mongo_client(mongo_uri):
5     """Establish and validate connection to the MongoDB."""
6 
7     client = pymongo.MongoClient(
8         mongo_uri, appname="devrel.showcase.rag.cohere_mongodb.python"
9     )
10 
11     # Validate the connection
12     ping_result = client.admin.command("ping")
13     if ping_result.get("ok") == 1.0:
14         # Connection successful
15         print("Connection to MongoDB successful")
16         return client
17     else:
18         print("Connection to MongoDB failed")
19     return None
20 
21 
22 MONGO_URI = os.environ["MONGO_URI"]
23 
24 if not MONGO_URI:
25     print("MONGO_URI not set in environment variables")
26 
27 mongo_client = get_mongo_client(MONGO_URI)
28 
29 DB_NAME = "asset_management_use_case"
30 COLLECTION_NAME = "market_reports"
31 
32 db = mongo_client.get_database(DB_NAME)
33 collection = db.get_collection(COLLECTION_NAME)

Connection to MongoDB successful

1 # Delete any existing records in the collection
2 collection.delete_many({})

DeleteResult({'n': 63, 'electionId': ObjectId('7fffffff000000000000002b'), 'opTime': {'ts': Timestamp(1721913981, 63), 't': 43}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1721913981, 63), 'signature': {'hash': b'cU;+\xe3\xbdRc\t\x80\xad\x03\x16\x11\x18\xe6s\xebF\x01', 'keyId': 7353740577831124994}}, 'operationTime': Timestamp(1721913981, 63)}, acknowledged=True)

Step 5: Data Ingestion

MongoDB’s Document model and its compatibility with Python dictionaries offer several benefits for data ingestion.

Document-oriented structure:
- MongoDB stores data in JSON-like documents: BSON(Binary JSON).
- This aligns naturally with Python dictionaries, allowing for seamless data representation using key value pair data structures.
Schema flexibility:
- MongoDB is schema-less, meaning each document in a collection can have a different structure.
- This flexibility matches Python’s dynamic nature, allowing you to ingest varied data structures without predefined schemas.
Efficient ingestion:
- The similarity between Python dictionaries and MongoDB documents allows for direct ingestion without complex transformations.
- This leads to faster data insertion and reduced processing overhead.

1 documents = dataset_df.to_dict("records")
2 collection.insert_many(documents)
3 
4 print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed

Step 6: MongoDB Query language and Vector Search

Query flexibility

MongoDB’s query language is designed to work well with document structures, making it easy to query and manipulate ingested data using familiar Python-like syntax.

Aggregation Pipeline

MongoDB’s aggregation pipelines is a powerful feature of the MongoDB Database that allows for complex data processing and analysis within the database. Aggregation pipeline can be thought of similarly to pipelines in data engineering or machine learning, where processes operate sequentially, each stage taking an input, performing operations, and providing an output for the next stage.

Stages

Stages are the building blocks of an aggregation pipeline. Each stage represents a specific data transformation or analysis operation. Common stages include:

$match: Filters documents (similar to WHERE in SQL)
$group: Groups documents by specified fields
$sort: Sorts the documents
$project: Reshapes documents (select, rename, compute fields)
$limit: Limits the number of documents
$unwind: Deconstructs array fields
$lookup: Performs left outer joins with other collections

1 def vector_search(user_query, collection):
2     """
3     Perform a vector search in the MongoDB collection based on the user query.
4 
5     Args:
6     user_query (str): The user's query string.
7     collection (MongoCollection): The MongoDB collection to search.
8 
9     Returns:
10     list: A list of matching documents.
11     """
12 
13     # Generate embedding for the user query
14     query_embedding = get_embedding(
15         user_query, input_type="search_query"
16     )
17 
18     if query_embedding is None:
19         return "Invalid query or embedding generation failed."
20 
21     # Define the vector search pipeline
22     vector_search_stage = {
23         "$vectorSearch": {
24             "index": "vector_index",
25             "queryVector": query_embedding,
26             "path": "embedding",
27             "numCandidates": 150,  # Number of candidate matches to consider
28             "limit": 5,  # Return top 4 matches
29         }
30     }
31 
32     unset_stage = {
33         "$unset": "embedding"  # Exclude the 'embedding' field from the results
34     }
35 
36     project_stage = {
37         "$project": {
38             "_id": 0,  # Exclude the _id field
39             "company": 1,  # Include the plot field
40             "reports": 1,  # Include the title field
41             "combined_attributes": 1,  # Include the genres field
42             "score": {
43                 "$meta": "vectorSearchScore"  # Include the search score
44             },
45         }
46     }
47 
48     pipeline = [vector_search_stage, unset_stage, project_stage]
49 
50     # Execute the search
51     results = collection.aggregate(pipeline)
52     return list(results)

Step 7: Add the Cohere Reranker

Cohere rerank functions as a second stage search that can improve the precision of your first stage search results

1 def rerank_documents(query: str, documents, top_n: int = 3):
2     # Perform reranking with Cohere ReRank Model
3     try:
4         response = co.rerank(
5             model="rerank-english-v3.0",
6             query=query,
7             documents=documents,
8             top_n=top_n,
9             rank_fields=["company", "reports", "combined_attributes"],
10         )
11 
12         # Extract the top reranked documents
13         top_documents_after_rerank = []
14         for result in response.results:
15             original_doc = documents[result.index]
16             top_documents_after_rerank.append(
17                 {
18                     "company": original_doc["company"],
19                     "combined_attributes": original_doc[
20                         "combined_attributes"
21                     ],
22                     "reports": original_doc["reports"],
23                     "vector_search_score": original_doc["score"],
24                     "relevance_score": result.relevance_score,
25                 }
26             )
27 
28         return top_documents_after_rerank
29 
30     except Exception as e:
31         print(f"An error occurred during reranking: {e}")
32         # Return top N documents without reranking
33         return documents[:top_n]

1 import pprint
2 
3 query = "What companies have negative market reports or negative sentiment that might deter from investment in the long term"
4 
5 get_knowledge = vector_search(query, collection)
6 pd.DataFrame(get_knowledge).head()

	reports	company	combined_attributes	score
0	[{‘author’: ‘Jordan Garcia, Senior Tech Analys…	GreenEnergy Corp	GreenEnergy Corp Information Technology 2023 G…	0.659524
1	[{‘author’: ‘Morgan Smith, Technology Sector L…	BioTech Therapeutics	BioTech Therapeutics Information Technology 20…	0.646300
2	[{‘author’: ‘Casey Davis, Technology Sector Le…	RenewableEnergy Innovations	RenewableEnergy Innovations Information Techno…	0.645224
3	[{‘author’: ‘Morgan Johnson, Technology Sector…	QuantumSensor Corp	QuantumSensor Corp Information Technology 2023…	0.644383
4	[{‘author’: ‘Morgan Williams, Senior Tech Anal…`	BioEngineering Corp	BioEngineering Corp Information Technology 202…	0.643690

1 reranked_documents = rerank_documents(query, get_knowledge)
2 pd.DataFrame(reranked_documents).head()

	company	combined_attributes	reports	vector_search_score	relevance_score
0	GreenEnergy Corp	GreenEnergy Corp Information Technology 2023 G…	[{‘author’: ‘Jordan Garcia, Senior Tech Analys…	0.659524	0.000147
1	BioEngineering Corp	BioEngineering Corp Information Technology 202…	[{‘author’: ‘Morgan Williams, Senior Tech Anal…	0.643690	0.000065
2	QuantumSensor Corp	QuantumSensor Corp Information Technology 2023…	[{‘author’: ‘Morgan Johnson, Technology Sector…	0.644383	0.000054

Step 8: Handling User Queries

1 def format_documents_for_chat(documents):
2     return [
3         {
4             "company": doc["company"],
5             # "reports": doc['reports'],
6             "combined_attributes": doc["combined_attributes"],
7         }
8         for doc in documents
9     ]
10 
11 
12 # Generating response with Cohere Command R
13 response = co.chat(
14     message=query,
15     documents=format_documents_for_chat(reranked_documents),
16     model="command-a-03-2025",
17     temperature=0.3,
18 )
19 
20 print("Final answer:")
21 print(response.text)

Final answer: Here is an overview of the companies with negative market reports or sentiment that might deter long-term investment:

GreenEnergy Corp (GRNE):

Challenges: Despite solid financial performance and a positive market position, GRNE faces challenges due to the volatile political environment and rising trade tensions, resulting in increased tariffs and supply chain disruptions.
Regulatory Scrutiny: The company is under scrutiny for its data handling practices, raising concerns about potential privacy breaches and ethical dilemmas.

BioEngineering Corp (BENC):

Regulatory Hurdles: BENC faces delays in obtaining approvals for certain products due to stringent healthcare regulations, impacting their time-to-market.
Reimbursement and Pricing Pressures: As healthcare costs rise, the company must carefully navigate pricing strategies to balance accessibility and profitability.
Research and Development Expenses: BENC has experienced a significant increase in R&D expenses, which may impact its ability to maintain a competitive pricing strategy.

QuantumSensor Corp (QSCP):

Supply Chain Disruptions: QSCP has faced supply chain issues due to global logistics problems and geopolitical tensions, impacting production and delivery.
Regulatory Scrutiny: The company is under scrutiny for its data collection and handling practices, with potential privacy and ethical concerns.
Technical Workforce Challenges: Attracting and retaining skilled talent in a competitive market has been challenging for QSCP.

1 for cite in response.citations:
2     print(cite)

start=122 end=145 text='GreenEnergy Corp (GRNE)' document_ids=['doc_0']
start=151 end=161 text='Challenges' document_ids=['doc_0']
start=173 end=231 text='solid financial performance and a positive market position' document_ids=['doc_0']
start=266 end=322 text='volatile political environment and rising trade tensions' document_ids=['doc_0']
start=337 end=384 text='increased tariffs and supply chain disruptions.' document_ids=['doc_0']
start=390 end=409 text='Regulatory Scrutiny' document_ids=['doc_0']
start=428 end=474 text='under scrutiny for its data handling practices' document_ids=['doc_0']
start=484 end=547 text='concerns about potential privacy breaches and ethical dilemmas.' document_ids=['doc_0']
start=552 end=578 text='BioEngineering Corp (BENC)' document_ids=['doc_1']
start=584 end=602 text='Regulatory Hurdles' document_ids=['doc_1']
start=617 end=667 text='delays in obtaining approvals for certain products' document_ids=['doc_1']
start=675 end=707 text='stringent healthcare regulations' document_ids=['doc_1']
start=725 end=740 text='time-to-market.' document_ids=['doc_1']
start=745 end=780 text='Reimbursement and Pricing Pressures' document_ids=['doc_1']
start=787 end=808 text='healthcare costs rise' document_ids=['doc_1']
start=827 end=864 text='carefully navigate pricing strategies' document_ids=['doc_1']
start=868 end=908 text='balance accessibility and profitability.' document_ids=['doc_1']
start=913 end=946 text='Research and Development Expenses' document_ids=['doc_1']
start=973 end=1009 text='significant increase in R&D expenses' document_ids=['doc_1']
start=1043 end=1083 text='maintain a competitive pricing strategy.' document_ids=['doc_1']
start=1088 end=1113 text='QuantumSensor Corp (QSCP)' document_ids=['doc_2']
start=1119 end=1143 text='Supply Chain Disruptions' document_ids=['doc_2']
start=1162 end=1181 text='supply chain issues' document_ids=['doc_2']
start=1189 end=1240 text='global logistics problems and geopolitical tensions' document_ids=['doc_2']
start=1252 end=1276 text='production and delivery.' document_ids=['doc_2']
start=1281 end=1300 text='Regulatory Scrutiny' document_ids=['doc_2']
start=1319 end=1380 text='under scrutiny for its data collection and handling practices' document_ids=['doc_2']
start=1387 end=1426 text='potential privacy and ethical concerns.' document_ids=['doc_2']
start=1431 end=1461 text='Technical Workforce Challenges' document_ids=['doc_2']
start=1465 end=1528 text='Attracting and retaining skilled talent in a competitive market' document_ids=['doc_2']

Step 9: Using MongoDB as a Data Store for Conversation History

1 from typing import Dict, Optional, List
2 
3 
4 class CohereChat:
5 
6     def __init__(
7         self,
8         cohere_client,
9         system: str = "",
10         database: str = "cohere_chat",
11         main_collection: str = "main_collection",
12         history_params: Optional[Dict[str, str]] = None,
13     ):
14         self.co = cohere_client
15         self.system = system
16         self.history_params = history_params or {}
17 
18         # Use the connection string from history_params
19         self.client = pymongo.MongoClient(
20             self.history_params.get(
21                 "connection_string", "mongodb://localhost:27017/"
22             )
23         )
24 
25         # Use the database parameter
26         self.db = self.client[database]
27 
28         # Use the main_collection parameter
29         self.main_collection = self.db[main_collection]
30 
31         # Use the history_collection from history_params, or default to "chat_history"
32         self.history_collection = self.db[
33             self.history_params.get(
34                 "history_collection", "chat_history"
35             )
36         ]
37 
38         # Use the session_id from history_params, or default to "default_session"
39         self.session_id = self.history_params.get(
40             "session_id", "default_session"
41         )
42 
43     def add_to_history(self, message: str, prefix: str = ""):
44         self.history_collection.insert_one(
45             {
46                 "session_id": self.session_id,
47                 "message": message,
48                 "prefix": prefix,
49             }
50         )
51 
52     def get_chat_history(self) -> List[Dict[str, str]]:
53         history = self.history_collection.find(
54             {"session_id": self.session_id}
55         ).sort("_id", 1)
56         return [
57             {
58                 "role": (
59                     "user" if item["prefix"] == "USER" else "chatbot"
60                 ),
61                 "message": item["message"],
62             }
63             for item in history
64         ]
65 
66     def rerank_documents(
67         self, query: str, documents: List[Dict], top_n: int = 3
68     ) -> List[Dict]:
69         rerank_docs = [
70             {
71                 "company": doc["company"],
72                 "combined_attributes": doc["combined_attributes"],
73             }
74             for doc in documents
75             if doc["combined_attributes"].strip()
76         ]
77 
78         if not rerank_docs:
79             print("No valid documents to rerank.")
80             return []
81 
82         try:
83             response = self.co.rerank(
84                 query=query,
85                 documents=rerank_docs,
86                 top_n=top_n,
87                 model="rerank-english-v3.0",
88                 rank_fields=["company", "combined_attributes"],
89             )
90 
91             top_documents_after_rerank = [
92                 {
93                     "company": rerank_docs[result.index]["company"],
94                     "combined_attributes": rerank_docs[result.index][
95                         "combined_attributes"
96                     ],
97                     "relevance_score": result.relevance_score,
98                 }
99                 for result in response.results
100             ]
101 
102             print(
103                 f"\nHere are the top {top_n} documents after rerank:"
104             )
105             for doc in top_documents_after_rerank:
106                 print(
107                     f"== {doc['company']} (Relevance: {doc['relevance_score']:.4f})"
108                 )
109 
110             return top_documents_after_rerank
111 
112         except Exception as e:
113             print(f"An error occurred during reranking: {e}")
114             return documents[:top_n]
115 
116     def format_documents_for_chat(
117         self, documents: List[Dict]
118     ) -> List[Dict]:
119         return [
120             {
121                 "company": doc["company"],
122                 "combined_attributes": doc["combined_attributes"],
123             }
124             for doc in documents
125         ]
126 
127     def send_message(self, message: str, vector_search_func) -> str:
128         self.add_to_history(message, "USER")
129 
130         # Perform vector search
131         search_results = vector_search_func(
132             message, self.main_collection
133         )
134 
135         # Rerank the search results
136         reranked_documents = self.rerank_documents(
137             message, search_results
138         )
139 
140         # Format documents for chat
141         formatted_documents = self.format_documents_for_chat(
142             reranked_documents
143         )
144 
145         # Generate response using Cohere chat
146         response = self.co.chat(
147             chat_history=self.get_chat_history(),
148             message=message,
149             documents=formatted_documents,
150             model="command-a-03-2025",
151             temperature=0.3,
152         )
153 
154         result = response.text
155         self.add_to_history(result, "CHATBOT")
156 
157         print("Final answer:")
158         print(result)
159 
160         print("\nCitations:")
161         for cite in response.citations:
162             print(cite)
163 
164         return result
165 
166     def show_history(self):
167         history = self.history_collection.find(
168             {"session_id": self.session_id}
169         ).sort("_id", 1)
170         for item in history:
171             print(f"{item['prefix']}: {item['message']}")
172             print("-------------------------")

1 # Initialize CohereChat
2 chat = CohereChat(
3     co,
4     system="You are a helpful assistant taking on the role of an Asset Manager focused on tech companies.",
5     database=DB_NAME,
6     main_collection=COLLECTION_NAME,
7     history_params={
8         "connection_string": MONGO_URI,
9         "history_collection": "chat_history",
10         "session_id": 2,
11     },
12 )
13 
14 # Send a message
15 response = chat.send_message(
16     "What is the best investment to make why?", vector_search
17 )

Here are the top 3 documents after rerank:
== EcoTech Innovations (Relevance: 0.0001)
== GreenEnergy Systems (Relevance: 0.0001)
== QuantumComputing Inc (Relevance: 0.0000)
Final answer:
I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided. 
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector. 
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector. 
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry. 
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
Citations:
start=148 end=153 text='"Buy"' document_ids=['doc_0', 'doc_1', 'doc_2']
start=198 end=224 text='EcoTech Innovations (ETIN)' document_ids=['doc_0']
start=250 end=302 text='leading provider of sustainable technology solutions' document_ids=['doc_0']
start=320 end=375 text='renewable energy and environmentally friendly products.' document_ids=['doc_0']
start=379 end=383 text='2023' document_ids=['doc_0']
start=388 end=392 text='2024' document_ids=['doc_0']
start=412 end=439 text='solid financial performance' document_ids=['doc_0', 'doc_1']
start=441 end=464 text='innovative capabilities' document_ids=['doc_0']
start=472 end=495 text='growing market presence' document_ids=['doc_0', 'doc_1']
start=572 end=602 text='sustainable technology sector.' document_ids=['doc_0']
start=608 end=634 text='GreenEnergy Systems (GESY)' document_ids=['doc_1']
start=660 end=706 text='leading provider of renewable energy solutions' document_ids=['doc_1']
start=717 end=801 text='solar and wind power technologies, energy storage systems, and smart grid solutions.' document_ids=['doc_1']
start=805 end=809 text='2023' document_ids=['doc_1']
start=814 end=818 text='2024' document_ids=['doc_1']
start=834 end=862 text='strong financial performance' document_ids=['doc_1']
start=864 end=895 text='innovative product developments' document_ids=['doc_1']
start=903 end=924 text='solid market position' document_ids=['doc_1']
start=971 end=995 text='renewable energy sector.' document_ids=['doc_1']
start=1001 end=1029 text='QuantumComputing Inc. (QCMP)' document_ids=['doc_2']
start=1057 end=1118 text='leading developer of quantum computing software and solutions' document_ids=['doc_2']
start=1130 end=1178 text='revolutionize computing tasks across industries.' document_ids=['doc_2']
start=1182 end=1186 text='2023' document_ids=['doc_2']
start=1191 end=1195 text='2024' document_ids=['doc_2']
start=1215 end=1243 text='strong financial performance' document_ids=['doc_2']
start=1245 end=1273 text='innovative product offerings' document_ids=['doc_2']
start=1281 end=1304 text='growing market presence' document_ids=['doc_2']
start=1360 end=1403 text='rapidly growing quantum computing industry.' document_ids=['doc_2']

1 # Show chat history
2 chat.show_history()

USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can tell you about some companies that have been recommended as "Buy" investments in the documents provided. 
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity. 
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market. 
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects. 
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can provide you with some companies that have been recommended as "Buy" investments in the documents provided. 
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity. 
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market. 
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects. 
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I can provide information on companies that have been recommended as "Buy" investments in the documents provided. 
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to a diverse range of businesses. In 2023, CISY demonstrated strong financial performance and product innovation, positioning it well in the competitive cloud market. 
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported robust financial results, innovative product developments, and strategic partnerships, making it a solid investment choice for those with a long-term investment horizon. 
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023 and 2024, BTCI demonstrated solid financial growth, product innovations, and an improved market position, making it an attractive investment opportunity for long-term growth. 
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided. 
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector. 
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector. 
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry. 
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------