Build Chatbots That Know Your Business with MongoDB and Cohere

What you will learn:

  • How to empower leverage semantic search on customer or operational data in MongoDB Atlas.
  • Pass retrieved data to Cohere’s Command R+ generative model for retrieval-augmented generation (RAG).
  • Develop and deploy a RAG-optimized user interface for your app.
  • Create a conversation data store for your RAG chatbot using MongoDB

Use Case: Develop an advanced chatbot assistant that provides asset managers with information and actionable insights on technology company market reports.

Introduction

  • What is Cohere?
  • What is MongoDB?
  • How Cohere and MongoDB work together?

What is Cohere?

What is MongoDB?

What exactly are we showing today?

Step 1: Install libaries and Set Environment Variables

Critical Security Reminder: Safeguard your production environment by never committing sensitive information, such as environment variable values, to public repositories. This practice is essential for maintaining the security and integrity of your systems.

Libraries:

  • cohere: A Python library for accessing Cohere’s large language models, enabling natural language processing tasks like text generation, classification, and embedding.
  • pymongo: The recommended Python driver for MongoDB, allowing Python applications to interact with MongoDB databases for data storage and retrieval.
  • datasets: A library by Hugging Face that provides easy access to a wide range of datasets for machine learning and natural language processing tasks. *tqdm: A fast, extensible progress bar library for Python, useful for displaying progress in long-running operations or loops.
1pip install --quiet datasets tqdm cohere pymongo
1import os
2import cohere
3
4os.environ["COHERE_API_KEY"] = ""
5co = cohere.Client(os.environ.get("COHERE_API_KEY"))
6
7os.environ["HF_TOKEN"] = ""

Step 2: Data Loading and Preparation

Dataset Information

This dataset contains detailed information about multiple technology companies in the Information Technology sector. For each company, the dataset includes:

  1. Company name and stock ticker symbol
  2. Market analysis reports for recent years (typically 2023 and 2024), which include:
  • Title and author of the report
  • Date of publication
  • Detailed content covering financial performance, product innovations, market position, challenges, and future outlook
  • Stock recommendations and price targets
  1. Key financial metrics such as:
  • Current stock price
  • 52-week price range
  • Market capitalization
  • Price-to-earnings (P/E) ratio
  • Dividend yield
  1. Recent news items, typically including:
  • Date of the news
  • Headline
  • Brief summary

The market analysis reports provide in-depth information about each company’s performance, innovations, challenges, and future prospects. They offer insights into the companies’ strategies, market positions, and potential for growth.

1import pandas as pd
2from datasets import load_dataset
3
4# Make sure you have an Hugging Face token(HF_TOKEN) in your development environemnt before running the code below
5# How to get a token: https://huggingface.co/docs/hub/en/security-tokens
6# https://huggingface.co/datasets/MongoDB/fake_tech_companies_market_reports
7dataset = load_dataset(
8 "MongoDB/fake_tech_companies_market_reports",
9 split="train",
10 streaming=True,
11)
12dataset_df = dataset.take(100)
13
14# Convert the dataset to a pandas dataframe
15dataset_df = pd.DataFrame(dataset_df)
16dataset_df.head(5)
recent_newsreportscompanytickerkey_metricssector
0[{‘date’: ‘2024-06-09’, ‘headline’: ‘CyberDefe…[{‘author’: ‘Taylor Smith, Technology Sector L…CyberDefense DynamicsCDDY{‘52_week_range’: {‘high’: 387.3, ‘low’: 41.63…Information Technology
1[{‘date’: ‘2024-07-04’, ‘headline’: ‘CloudComp…[{‘author’: ‘Casey Jones, Chief Market Strateg…CloudCompute ProCCPR{‘52_week_range’: {‘high’: 524.23, ‘low’: 171…Information Technology
2[{‘date’: ‘2024-06-27’, ‘headline’: ‘VirtualRe…[{‘author’: ‘Sam Brown, Head of Equity Researc…VirtualReality SystemsVRSY{‘52_week_range’: {‘high’: 530.59, ‘low’: 56.4…Information Technology
3[{‘date’: ‘2024-07-06’, ‘headline’: ‘BioTech I…[{‘author’: ‘Riley Smith, Senior Tech Analyst…BioTech InnovationsBTCI{‘52_week_range’: {‘high’: 366.55, ‘low’: 124…Information Technology
4[{‘date’: ‘2024-06-26’, ‘headline’: ‘QuantumCo…[{‘author’: ‘Riley Garcia, Senior Tech Analyst…QuantumComputing IncQCMP{‘52_week_range’: {‘high’: 231.91, ‘low’: 159…Information Technology
1# Data Preparation
2def combine_attributes(row):
3 combined = f"{row['company']} {row['sector']} "
4
5 # Add reports information
6 for report in row["reports"]:
7 combined += f"{report['year']} {report['title']} {report['author']} {report['content']} "
8
9 # Add recent news information
10 for news in row["recent_news"]:
11 combined += f"{news['headline']} {news['summary']} "
12
13 return combined.strip()
1# Add the new column 'combined_attributes'
2dataset_df["combined_attributes"] = dataset_df.apply(
3 combine_attributes, axis=1
4)
1# Display the first few rows of the updated dataframe
2dataset_df[["company", "ticker", "combined_attributes"]].head()
companytickercombined_attributes
0CyberDefense DynamicsCDDYCyberDefense Dynamics Information Technology 2…
1CloudCompute ProCCPRCloudCompute Pro Information Technology 2023 C…
2VirtualReality SystemsVRSYVirtualReality Systems Information Technology …
3BioTech InnovationsBTCIBioTech Innovations Information Technology 202…
4QuantumComputing IncQCMPQuantumComputing Inc Information Technology 20…

Step 3: Embedding Generation with Cohere

1from tqdm import tqdm
2
3
4def get_embedding(
5 text: str, input_type: str = "search_document"
6) -> list[float]:
7 if not text.strip():
8 print("Attempted to get embedding for empty text.")
9 return []
10
11 model = "embed-english-v3.0"
12 response = co.embed(
13 texts=[text],
14 model=model,
15 input_type=input_type, # Used for embeddings of search queries run against a vector DB to find relevant documents
16 embedding_types=["float"],
17 )
18
19 return response.embeddings.float[0]
20
21
22# Apply the embedding function with a progress bar
23tqdm.pandas(desc="Generating embeddings")
24dataset_df["embedding"] = dataset_df[
25 "combined_attributes"
26].progress_apply(get_embedding)
27
28print(f"We just computed {len(dataset_df['embedding'])} embeddings.")

We just computed 63 embeddings.

1dataset_df.head()
recent_newsreportscompanytickerkey_metricssectorcombined_attributesembedding
0[{‘date’: ‘2024-06-09’, ‘headline’: ‘CyberDefe…[{‘author’: ‘Taylor Smith, Technology Sector L…CyberDefense DynamicsCDDY{‘52_week_range’: {‘high’: 387.3, ‘low’: 41.63…Information TechnologyCyberDefense Dynamics Information Technology 2…[0.01210022, -0.03466797, -0.017562866, -0.025…
1[{‘date’: ‘2024-07-04’, ‘headline’: ‘CloudComp…[{‘author’: ‘Casey Jones, Chief Market Strateg…CloudCompute ProCCPR{‘52_week_range’: {‘high’: 524.23, ‘low’: 171…Information TechnologyCloudCompute Pro Information Technology 2023 C…[-0.058563232, -0.06323242, -0.037139893, -0.0…
2[{‘date’: ‘2024-06-27’, ‘headline’: ‘VirtualRe…[{‘author’: ‘Sam Brown, Head of Equity Researc…VirtualReality SystemsVRSY{‘52_week_range’: {‘high’: 530.59, ‘low’: 56.4…Information TechnologyVirtualReality Systems Information Technology …[0.024154663, -0.022872925, -0.01751709, -0.05…
3[{‘date’: ‘2024-07-06’, ‘headline’: ‘BioTech I…[{‘author’: ‘Riley Smith, Senior Tech Analyst’…BioTech InnovationsBTCI{‘52_week_range’: {‘high’: 366.55, ‘low’: 124…Information TechnologyBioTech Innovations Information Technology 202…[0.020736694, -0.041046143, -0.0029773712, -0…
4[{‘date’: ‘2024-06-26’, ‘headline’: ‘QuantumCo…[{‘author’: ‘Riley Garcia, Senior Tech Analyst…QuantumComputing IncQCMP{‘52_week_range’: {‘high’: 231.91, ‘low’: 159…Information TechnologyQuantumComputing Inc Information Technology 20…[-0.009757996, -0.04815674, 0.039611816, 0.023…

Step 4: MongoDB Vector Database and Connection Setup

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
  3. Create the database: asset_management_use_case.
  4. Within the database asset_management_use_case, create the collection market_reports.
  5. Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.

Your vector search index created on MongoDB Atlas should look like below:

{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}

Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

1import os
2
3os.environ["MONGO_URI"] = ""
1import pymongo
2
3
4def get_mongo_client(mongo_uri):
5 """Establish and validate connection to the MongoDB."""
6
7 client = pymongo.MongoClient(
8 mongo_uri, appname="devrel.showcase.rag.cohere_mongodb.python"
9 )
10
11 # Validate the connection
12 ping_result = client.admin.command("ping")
13 if ping_result.get("ok") == 1.0:
14 # Connection successful
15 print("Connection to MongoDB successful")
16 return client
17 else:
18 print("Connection to MongoDB failed")
19 return None
20
21
22MONGO_URI = os.environ["MONGO_URI"]
23
24if not MONGO_URI:
25 print("MONGO_URI not set in environment variables")
26
27mongo_client = get_mongo_client(MONGO_URI)
28
29DB_NAME = "asset_management_use_case"
30COLLECTION_NAME = "market_reports"
31
32db = mongo_client.get_database(DB_NAME)
33collection = db.get_collection(COLLECTION_NAME)

Connection to MongoDB successful

1# Delete any existing records in the collection
2collection.delete_many({})
DeleteResult({'n': 63, 'electionId': ObjectId('7fffffff000000000000002b'), 'opTime': {'ts': Timestamp(1721913981, 63), 't': 43}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1721913981, 63), 'signature': {'hash': b'cU;+\xe3\xbdRc\t\x80\xad\x03\x16\x11\x18\xe6s\xebF\x01', 'keyId': 7353740577831124994}}, 'operationTime': Timestamp(1721913981, 63)}, acknowledged=True)

Step 5: Data Ingestion

MongoDB’s Document model and its compatibility with Python dictionaries offer several benefits for data ingestion.

  • Document-oriented structure:
    • MongoDB stores data in JSON-like documents: BSON(Binary JSON).
    • This aligns naturally with Python dictionaries, allowing for seamless data representation using key value pair data structures.
  • Schema flexibility:
    • MongoDB is schema-less, meaning each document in a collection can have a different structure.
    • This flexibility matches Python’s dynamic nature, allowing you to ingest varied data structures without predefined schemas.
  • Efficient ingestion:
    • The similarity between Python dictionaries and MongoDB documents allows for direct ingestion without complex transformations.
    • This leads to faster data insertion and reduced processing overhead.

1documents = dataset_df.to_dict("records")
2collection.insert_many(documents)
3
4print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed

Query flexibility

MongoDB’s query language is designed to work well with document structures, making it easy to query and manipulate ingested data using familiar Python-like syntax.

Aggregation Pipeline

MongoDB’s aggregation pipelines is a powerful feature of the MongoDB Database that allows for complex data processing and analysis within the database. Aggregation pipeline can be thought of similarly to pipelines in data engineering or machine learning, where processes operate sequentially, each stage taking an input, performing operations, and providing an output for the next stage.

Stages

Stages are the building blocks of an aggregation pipeline. Each stage represents a specific data transformation or analysis operation. Common stages include:

  • $match: Filters documents (similar to WHERE in SQL)
  • $group: Groups documents by specified fields
  • $sort: Sorts the documents
  • $project: Reshapes documents (select, rename, compute fields)
  • $limit: Limits the number of documents
  • $unwind: Deconstructs array fields
  • $lookup: Performs left outer joins with other collections

1def vector_search(user_query, collection):
2 """
3 Perform a vector search in the MongoDB collection based on the user query.
4
5 Args:
6 user_query (str): The user's query string.
7 collection (MongoCollection): The MongoDB collection to search.
8
9 Returns:
10 list: A list of matching documents.
11 """
12
13 # Generate embedding for the user query
14 query_embedding = get_embedding(
15 user_query, input_type="search_query"
16 )
17
18 if query_embedding is None:
19 return "Invalid query or embedding generation failed."
20
21 # Define the vector search pipeline
22 vector_search_stage = {
23 "$vectorSearch": {
24 "index": "vector_index",
25 "queryVector": query_embedding,
26 "path": "embedding",
27 "numCandidates": 150, # Number of candidate matches to consider
28 "limit": 5, # Return top 4 matches
29 }
30 }
31
32 unset_stage = {
33 "$unset": "embedding" # Exclude the 'embedding' field from the results
34 }
35
36 project_stage = {
37 "$project": {
38 "_id": 0, # Exclude the _id field
39 "company": 1, # Include the plot field
40 "reports": 1, # Include the title field
41 "combined_attributes": 1, # Include the genres field
42 "score": {
43 "$meta": "vectorSearchScore" # Include the search score
44 },
45 }
46 }
47
48 pipeline = [vector_search_stage, unset_stage, project_stage]
49
50 # Execute the search
51 results = collection.aggregate(pipeline)
52 return list(results)

Step 7: Add the Cohere Reranker

Cohere rerank functions as a second stage search that can improve the precision of your first stage search results

1def rerank_documents(query: str, documents, top_n: int = 3):
2 # Perform reranking with Cohere ReRank Model
3 try:
4 response = co.rerank(
5 model="rerank-english-v3.0",
6 query=query,
7 documents=documents,
8 top_n=top_n,
9 rank_fields=["company", "reports", "combined_attributes"],
10 )
11
12 # Extract the top reranked documents
13 top_documents_after_rerank = []
14 for result in response.results:
15 original_doc = documents[result.index]
16 top_documents_after_rerank.append(
17 {
18 "company": original_doc["company"],
19 "combined_attributes": original_doc[
20 "combined_attributes"
21 ],
22 "reports": original_doc["reports"],
23 "vector_search_score": original_doc["score"],
24 "relevance_score": result.relevance_score,
25 }
26 )
27
28 return top_documents_after_rerank
29
30 except Exception as e:
31 print(f"An error occurred during reranking: {e}")
32 # Return top N documents without reranking
33 return documents[:top_n]
1import pprint
2
3query = "What companies have negative market reports or negative sentiment that might deter from investment in the long term"
4
5get_knowledge = vector_search(query, collection)
6pd.DataFrame(get_knowledge).head()
reportscompanycombined_attributesscore
0[{‘author’: ‘Jordan Garcia, Senior Tech Analys…GreenEnergy CorpGreenEnergy Corp Information Technology 2023 G…0.659524
1[{‘author’: ‘Morgan Smith, Technology Sector L…BioTech TherapeuticsBioTech Therapeutics Information Technology 20…0.646300
2[{‘author’: ‘Casey Davis, Technology Sector Le…RenewableEnergy InnovationsRenewableEnergy Innovations Information Techno…0.645224
3[{‘author’: ‘Morgan Johnson, Technology Sector…QuantumSensor CorpQuantumSensor Corp Information Technology 2023…0.644383
4[{‘author’: ‘Morgan Williams, Senior Tech Anal…`BioEngineering CorpBioEngineering Corp Information Technology 202…0.643690
1reranked_documents = rerank_documents(query, get_knowledge)
2pd.DataFrame(reranked_documents).head()
companycombined_attributesreportsvector_search_scorerelevance_score
0GreenEnergy CorpGreenEnergy Corp Information Technology 2023 G…[{‘author’: ‘Jordan Garcia, Senior Tech Analys…0.6595240.000147
1BioEngineering CorpBioEngineering Corp Information Technology 202…[{‘author’: ‘Morgan Williams, Senior Tech Anal…0.6436900.000065
2QuantumSensor CorpQuantumSensor Corp Information Technology 2023…[{‘author’: ‘Morgan Johnson, Technology Sector…0.6443830.000054

Step 8: Handling User Queries

1def format_documents_for_chat(documents):
2 return [
3 {
4 "company": doc["company"],
5 # "reports": doc['reports'],
6 "combined_attributes": doc["combined_attributes"],
7 }
8 for doc in documents
9 ]
10
11
12# Generating response with Cohere Command R
13response = co.chat(
14 message=query,
15 documents=format_documents_for_chat(reranked_documents),
16 model="command-r-plus",
17 temperature=0.3,
18)
19
20print("Final answer:")
21print(response.text)

Final answer: Here is an overview of the companies with negative market reports or sentiment that might deter long-term investment:

GreenEnergy Corp (GRNE):

  • Challenges: Despite solid financial performance and a positive market position, GRNE faces challenges due to the volatile political environment and rising trade tensions, resulting in increased tariffs and supply chain disruptions.
  • Regulatory Scrutiny: The company is under scrutiny for its data handling practices, raising concerns about potential privacy breaches and ethical dilemmas.

BioEngineering Corp (BENC):

  • Regulatory Hurdles: BENC faces delays in obtaining approvals for certain products due to stringent healthcare regulations, impacting their time-to-market.
  • Reimbursement and Pricing Pressures: As healthcare costs rise, the company must carefully navigate pricing strategies to balance accessibility and profitability.
  • Research and Development Expenses: BENC has experienced a significant increase in R&D expenses, which may impact its ability to maintain a competitive pricing strategy.

QuantumSensor Corp (QSCP):

  • Supply Chain Disruptions: QSCP has faced supply chain issues due to global logistics problems and geopolitical tensions, impacting production and delivery.
  • Regulatory Scrutiny: The company is under scrutiny for its data collection and handling practices, with potential privacy and ethical concerns.
  • Technical Workforce Challenges: Attracting and retaining skilled talent in a competitive market has been challenging for QSCP.
1for cite in response.citations:
2 print(cite)
start=122 end=145 text='GreenEnergy Corp (GRNE)' document_ids=['doc_0']
start=151 end=161 text='Challenges' document_ids=['doc_0']
start=173 end=231 text='solid financial performance and a positive market position' document_ids=['doc_0']
start=266 end=322 text='volatile political environment and rising trade tensions' document_ids=['doc_0']
start=337 end=384 text='increased tariffs and supply chain disruptions.' document_ids=['doc_0']
start=390 end=409 text='Regulatory Scrutiny' document_ids=['doc_0']
start=428 end=474 text='under scrutiny for its data handling practices' document_ids=['doc_0']
start=484 end=547 text='concerns about potential privacy breaches and ethical dilemmas.' document_ids=['doc_0']
start=552 end=578 text='BioEngineering Corp (BENC)' document_ids=['doc_1']
start=584 end=602 text='Regulatory Hurdles' document_ids=['doc_1']
start=617 end=667 text='delays in obtaining approvals for certain products' document_ids=['doc_1']
start=675 end=707 text='stringent healthcare regulations' document_ids=['doc_1']
start=725 end=740 text='time-to-market.' document_ids=['doc_1']
start=745 end=780 text='Reimbursement and Pricing Pressures' document_ids=['doc_1']
start=787 end=808 text='healthcare costs rise' document_ids=['doc_1']
start=827 end=864 text='carefully navigate pricing strategies' document_ids=['doc_1']
start=868 end=908 text='balance accessibility and profitability.' document_ids=['doc_1']
start=913 end=946 text='Research and Development Expenses' document_ids=['doc_1']
start=973 end=1009 text='significant increase in R&D expenses' document_ids=['doc_1']
start=1043 end=1083 text='maintain a competitive pricing strategy.' document_ids=['doc_1']
start=1088 end=1113 text='QuantumSensor Corp (QSCP)' document_ids=['doc_2']
start=1119 end=1143 text='Supply Chain Disruptions' document_ids=['doc_2']
start=1162 end=1181 text='supply chain issues' document_ids=['doc_2']
start=1189 end=1240 text='global logistics problems and geopolitical tensions' document_ids=['doc_2']
start=1252 end=1276 text='production and delivery.' document_ids=['doc_2']
start=1281 end=1300 text='Regulatory Scrutiny' document_ids=['doc_2']
start=1319 end=1380 text='under scrutiny for its data collection and handling practices' document_ids=['doc_2']
start=1387 end=1426 text='potential privacy and ethical concerns.' document_ids=['doc_2']
start=1431 end=1461 text='Technical Workforce Challenges' document_ids=['doc_2']
start=1465 end=1528 text='Attracting and retaining skilled talent in a competitive market' document_ids=['doc_2']

Step 9: Using MongoDB as a Data Store for Conversation History

1from typing import Dict, Optional, List
2
3
4class CohereChat:
5
6 def __init__(
7 self,
8 cohere_client,
9 system: str = "",
10 database: str = "cohere_chat",
11 main_collection: str = "main_collection",
12 history_params: Optional[Dict[str, str]] = None,
13 ):
14 self.co = cohere_client
15 self.system = system
16 self.history_params = history_params or {}
17
18 # Use the connection string from history_params
19 self.client = pymongo.MongoClient(
20 self.history_params.get(
21 "connection_string", "mongodb://localhost:27017/"
22 )
23 )
24
25 # Use the database parameter
26 self.db = self.client[database]
27
28 # Use the main_collection parameter
29 self.main_collection = self.db[main_collection]
30
31 # Use the history_collection from history_params, or default to "chat_history"
32 self.history_collection = self.db[
33 self.history_params.get(
34 "history_collection", "chat_history"
35 )
36 ]
37
38 # Use the session_id from history_params, or default to "default_session"
39 self.session_id = self.history_params.get(
40 "session_id", "default_session"
41 )
42
43 def add_to_history(self, message: str, prefix: str = ""):
44 self.history_collection.insert_one(
45 {
46 "session_id": self.session_id,
47 "message": message,
48 "prefix": prefix,
49 }
50 )
51
52 def get_chat_history(self) -> List[Dict[str, str]]:
53 history = self.history_collection.find(
54 {"session_id": self.session_id}
55 ).sort("_id", 1)
56 return [
57 {
58 "role": (
59 "user" if item["prefix"] == "USER" else "chatbot"
60 ),
61 "message": item["message"],
62 }
63 for item in history
64 ]
65
66 def rerank_documents(
67 self, query: str, documents: List[Dict], top_n: int = 3
68 ) -> List[Dict]:
69 rerank_docs = [
70 {
71 "company": doc["company"],
72 "combined_attributes": doc["combined_attributes"],
73 }
74 for doc in documents
75 if doc["combined_attributes"].strip()
76 ]
77
78 if not rerank_docs:
79 print("No valid documents to rerank.")
80 return []
81
82 try:
83 response = self.co.rerank(
84 query=query,
85 documents=rerank_docs,
86 top_n=top_n,
87 model="rerank-english-v3.0",
88 rank_fields=["company", "combined_attributes"],
89 )
90
91 top_documents_after_rerank = [
92 {
93 "company": rerank_docs[result.index]["company"],
94 "combined_attributes": rerank_docs[result.index][
95 "combined_attributes"
96 ],
97 "relevance_score": result.relevance_score,
98 }
99 for result in response.results
100 ]
101
102 print(
103 f"\nHere are the top {top_n} documents after rerank:"
104 )
105 for doc in top_documents_after_rerank:
106 print(
107 f"== {doc['company']} (Relevance: {doc['relevance_score']:.4f})"
108 )
109
110 return top_documents_after_rerank
111
112 except Exception as e:
113 print(f"An error occurred during reranking: {e}")
114 return documents[:top_n]
115
116 def format_documents_for_chat(
117 self, documents: List[Dict]
118 ) -> List[Dict]:
119 return [
120 {
121 "company": doc["company"],
122 "combined_attributes": doc["combined_attributes"],
123 }
124 for doc in documents
125 ]
126
127 def send_message(self, message: str, vector_search_func) -> str:
128 self.add_to_history(message, "USER")
129
130 # Perform vector search
131 search_results = vector_search_func(
132 message, self.main_collection
133 )
134
135 # Rerank the search results
136 reranked_documents = self.rerank_documents(
137 message, search_results
138 )
139
140 # Format documents for chat
141 formatted_documents = self.format_documents_for_chat(
142 reranked_documents
143 )
144
145 # Generate response using Cohere chat
146 response = self.co.chat(
147 chat_history=self.get_chat_history(),
148 message=message,
149 documents=formatted_documents,
150 model="command-r-plus",
151 temperature=0.3,
152 )
153
154 result = response.text
155 self.add_to_history(result, "CHATBOT")
156
157 print("Final answer:")
158 print(result)
159
160 print("\nCitations:")
161 for cite in response.citations:
162 print(cite)
163
164 return result
165
166 def show_history(self):
167 history = self.history_collection.find(
168 {"session_id": self.session_id}
169 ).sort("_id", 1)
170 for item in history:
171 print(f"{item['prefix']}: {item['message']}")
172 print("-------------------------")
1# Initialize CohereChat
2chat = CohereChat(
3 co,
4 system="You are a helpful assistant taking on the role of an Asset Manager focused on tech companies.",
5 database=DB_NAME,
6 main_collection=COLLECTION_NAME,
7 history_params={
8 "connection_string": MONGO_URI,
9 "history_collection": "chat_history",
10 "session_id": 2,
11 },
12)
13
14# Send a message
15response = chat.send_message(
16 "What is the best investment to make why?", vector_search
17)
Here are the top 3 documents after rerank:
== EcoTech Innovations (Relevance: 0.0001)
== GreenEnergy Systems (Relevance: 0.0001)
== QuantumComputing Inc (Relevance: 0.0000)
Final answer:
I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided.
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector.
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector.
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
Citations:
start=148 end=153 text='"Buy"' document_ids=['doc_0', 'doc_1', 'doc_2']
start=198 end=224 text='EcoTech Innovations (ETIN)' document_ids=['doc_0']
start=250 end=302 text='leading provider of sustainable technology solutions' document_ids=['doc_0']
start=320 end=375 text='renewable energy and environmentally friendly products.' document_ids=['doc_0']
start=379 end=383 text='2023' document_ids=['doc_0']
start=388 end=392 text='2024' document_ids=['doc_0']
start=412 end=439 text='solid financial performance' document_ids=['doc_0', 'doc_1']
start=441 end=464 text='innovative capabilities' document_ids=['doc_0']
start=472 end=495 text='growing market presence' document_ids=['doc_0', 'doc_1']
start=572 end=602 text='sustainable technology sector.' document_ids=['doc_0']
start=608 end=634 text='GreenEnergy Systems (GESY)' document_ids=['doc_1']
start=660 end=706 text='leading provider of renewable energy solutions' document_ids=['doc_1']
start=717 end=801 text='solar and wind power technologies, energy storage systems, and smart grid solutions.' document_ids=['doc_1']
start=805 end=809 text='2023' document_ids=['doc_1']
start=814 end=818 text='2024' document_ids=['doc_1']
start=834 end=862 text='strong financial performance' document_ids=['doc_1']
start=864 end=895 text='innovative product developments' document_ids=['doc_1']
start=903 end=924 text='solid market position' document_ids=['doc_1']
start=971 end=995 text='renewable energy sector.' document_ids=['doc_1']
start=1001 end=1029 text='QuantumComputing Inc. (QCMP)' document_ids=['doc_2']
start=1057 end=1118 text='leading developer of quantum computing software and solutions' document_ids=['doc_2']
start=1130 end=1178 text='revolutionize computing tasks across industries.' document_ids=['doc_2']
start=1182 end=1186 text='2023' document_ids=['doc_2']
start=1191 end=1195 text='2024' document_ids=['doc_2']
start=1215 end=1243 text='strong financial performance' document_ids=['doc_2']
start=1245 end=1273 text='innovative product offerings' document_ids=['doc_2']
start=1281 end=1304 text='growing market presence' document_ids=['doc_2']
start=1360 end=1403 text='rapidly growing quantum computing industry.' document_ids=['doc_2']
1# Show chat history
2chat.show_history()
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can tell you about some companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can provide you with some companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I can provide information on companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to a diverse range of businesses. In 2023, CISY demonstrated strong financial performance and product innovation, positioning it well in the competitive cloud market.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported robust financial results, innovative product developments, and strategic partnerships, making it a solid investment choice for those with a long-term investment horizon.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023 and 2024, BTCI demonstrated solid financial growth, product innovations, and an improved market position, making it an attractive investment opportunity for long-term growth.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided.
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector.
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector.
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
Built with