For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
How to empower leverage semantic search on customer or operational data in MongoDB Atlas.
Pass retrieved data to Cohere’s Command R+ generative model for retrieval-augmented generation (RAG).
Develop and deploy a RAG-optimized user interface for your app.
Create a conversation data store for your RAG chatbot using MongoDB
Use Case: Develop an advanced chatbot assistant that provides asset managers with information and actionable insights on technology company market reports.
Introduction
What is Cohere?
What is MongoDB?
How Cohere and MongoDB work together?
What is Cohere?
What is MongoDB?
What exactly are we showing today?
Step 1: Install libaries and Set Environment Variables
Critical Security Reminder: Safeguard your production environment by never committing sensitive information, such as environment variable values, to public repositories. This practice is essential for maintaining the security and integrity of your systems.
Libraries:
cohere: A Python library for accessing Cohere’s large language models, enabling natural language processing tasks like text generation, classification, and embedding.
pymongo: The recommended Python driver for MongoDB, allowing Python applications to interact with MongoDB databases for data storage and retrieval.
datasets: A library by Hugging Face that provides easy access to a wide range of datasets for machine learning and natural language processing tasks.
*tqdm: A fast, extensible progress bar library for Python, useful for displaying progress in long-running operations or loops.
1
pip install --quiet datasets tqdm cohere pymongo
1
import os
2
import cohere
3
4
os.environ["COHERE_API_KEY"] = ""
5
co = cohere.Client(os.environ.get("COHERE_API_KEY"))
6
7
os.environ["HF_TOKEN"] = ""
Step 2: Data Loading and Preparation
Dataset Information
This dataset contains detailed information about multiple technology companies in the Information Technology sector. For each company, the dataset includes:
Company name and stock ticker symbol
Market analysis reports for recent years (typically 2023 and 2024), which include:
Title and author of the report
Date of publication
Detailed content covering financial performance, product innovations, market position, challenges, and future outlook
Stock recommendations and price targets
Key financial metrics such as:
Current stock price
52-week price range
Market capitalization
Price-to-earnings (P/E) ratio
Dividend yield
Recent news items, typically including:
Date of the news
Headline
Brief summary
The market analysis reports provide in-depth information about each company’s performance, innovations, challenges, and future prospects. They offer insights into the companies’ strategies, market positions, and potential for growth.
1
import pandas as pd
2
from datasets import load_dataset
3
4
# Make sure you have an Hugging Face token(HF_TOKEN) in your development environemnt before running the code below
5
# How to get a token: https://huggingface.co/docs/hub/en/security-tokens
print("Attempted to get embedding for empty text.")
9
return []
10
11
model = "embed-v4.0"
12
response = co.embed(
13
texts=[text],
14
model=model,
15
input_type=input_type, # Used for embeddings of search queries run against a vector DB to find relevant documents
16
embedding_types=["float"],
17
)
18
19
return response.embeddings.float[0]
20
21
22
# Apply the embedding function with a progress bar
23
tqdm.pandas(desc="Generating embeddings")
24
dataset_df["embedding"] = dataset_df[
25
"combined_attributes"
26
].progress_apply(get_embedding)
27
28
print(f"We just computed {len(dataset_df['embedding'])} embeddings.")
We just computed 63 embeddings.
1
dataset_df.head()
recent_news
reports
company
ticker
key_metrics
sector
combined_attributes
embedding
0
[{‘date’: ‘2024-06-09’, ‘headline’: ‘CyberDefe…
[{‘author’: ‘Taylor Smith, Technology Sector L…
CyberDefense Dynamics
CDDY
{‘52_week_range’: {‘high’: 387.3, ‘low’: 41.63…
Information Technology
CyberDefense Dynamics Information Technology 2…
[0.01210022, -0.03466797, -0.017562866, -0.025…
1
[{‘date’: ‘2024-07-04’, ‘headline’: ‘CloudComp…
[{‘author’: ‘Casey Jones, Chief Market Strateg…
CloudCompute Pro
CCPR
{‘52_week_range’: {‘high’: 524.23, ‘low’: 171…
Information Technology
CloudCompute Pro Information Technology 2023 C…
[-0.058563232, -0.06323242, -0.037139893, -0.0…
2
[{‘date’: ‘2024-06-27’, ‘headline’: ‘VirtualRe…
[{‘author’: ‘Sam Brown, Head of Equity Researc…
VirtualReality Systems
VRSY
{‘52_week_range’: {‘high’: 530.59, ‘low’: 56.4…
Information Technology
VirtualReality Systems Information Technology …
[0.024154663, -0.022872925, -0.01751709, -0.05…
3
[{‘date’: ‘2024-07-06’, ‘headline’: ‘BioTech I…
[{‘author’: ‘Riley Smith, Senior Tech Analyst’…
BioTech Innovations
BTCI
{‘52_week_range’: {‘high’: 366.55, ‘low’: 124…
Information Technology
BioTech Innovations Information Technology 202…
[0.020736694, -0.041046143, -0.0029773712, -0…
4
[{‘date’: ‘2024-06-26’, ‘headline’: ‘QuantumCo…
[{‘author’: ‘Riley Garcia, Senior Tech Analyst…
QuantumComputing Inc
QCMP
{‘52_week_range’: {‘high’: 231.91, ‘low’: 159…
Information Technology
QuantumComputing Inc Information Technology 20…
[-0.009757996, -0.04815674, 0.039611816, 0.023…
Step 4: MongoDB Vector Database and Connection Setup
MongoDB acts as both an operational and a vector database for the RAG system.
MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.
Creating a database and collection within MongoDB is made simple with MongoDB Atlas.
First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
Within the database asset_management_use_case, create the collection market_reports.
Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.
Your vector search index created on MongoDB Atlas should look like below:
{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}
Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.
1
import os
2
3
os.environ["MONGO_URI"] = ""
1
import pymongo
2
3
4
def get_mongo_client(mongo_uri):
5
"""Establish and validate connection to the MongoDB."""
MongoDB’s Document model and its compatibility with Python dictionaries offer several benefits for data ingestion.
Document-oriented structure:
MongoDB stores data in JSON-like documents: BSON(Binary JSON).
This aligns naturally with Python dictionaries, allowing for seamless data representation using key value pair data structures.
Schema flexibility:
MongoDB is schema-less, meaning each document in a collection can have a different structure.
This flexibility matches Python’s dynamic nature, allowing you to ingest varied data structures without predefined schemas.
Efficient ingestion:
The similarity between Python dictionaries and MongoDB documents allows for direct ingestion without complex transformations.
This leads to faster data insertion and reduced processing overhead.
1
documents = dataset_df.to_dict("records")
2
collection.insert_many(documents)
3
4
print("Data ingestion into MongoDB completed")
Data ingestion into MongoDB completed
Step 6: MongoDB Query language and Vector Search
Query flexibility
MongoDB’s query language is designed to work well with document structures, making it easy to query and manipulate ingested data using familiar Python-like syntax.
Aggregation Pipeline
MongoDB’s aggregation pipelines is a powerful feature of the MongoDB Database that allows for complex data processing and analysis within the database.
Aggregation pipeline can be thought of similarly to pipelines in data engineering or machine learning, where processes operate sequentially, each stage taking an input, performing operations, and providing an output for the next stage.
Stages
Stages are the building blocks of an aggregation pipeline.
Each stage represents a specific data transformation or analysis operation.
Common stages include:
$match: Filters documents (similar to WHERE in SQL)
Final answer:
Here is an overview of the companies with negative market reports or sentiment that might deter long-term investment:
GreenEnergy Corp (GRNE):
Challenges: Despite solid financial performance and a positive market position, GRNE faces challenges due to the volatile political environment and rising trade tensions, resulting in increased tariffs and supply chain disruptions.
Regulatory Scrutiny: The company is under scrutiny for its data handling practices, raising concerns about potential privacy breaches and ethical dilemmas.
BioEngineering Corp (BENC):
Regulatory Hurdles: BENC faces delays in obtaining approvals for certain products due to stringent healthcare regulations, impacting their time-to-market.
Reimbursement and Pricing Pressures: As healthcare costs rise, the company must carefully navigate pricing strategies to balance accessibility and profitability.
Research and Development Expenses: BENC has experienced a significant increase in R&D expenses, which may impact its ability to maintain a competitive pricing strategy.
QuantumSensor Corp (QSCP):
Supply Chain Disruptions: QSCP has faced supply chain issues due to global logistics problems and geopolitical tensions, impacting production and delivery.
Regulatory Scrutiny: The company is under scrutiny for its data collection and handling practices, with potential privacy and ethical concerns.
Technical Workforce Challenges: Attracting and retaining skilled talent in a competitive market has been challenging for QSCP.
system="You are a helpful assistant taking on the role of an Asset Manager focused on tech companies.",
5
database=DB_NAME,
6
main_collection=COLLECTION_NAME,
7
history_params={
8
"connection_string": MONGO_URI,
9
"history_collection": "chat_history",
10
"session_id": 2,
11
},
12
)
13
14
# Send a message
15
response = chat.send_message(
16
"What is the best investment to make why?", vector_search
17
)
Here are the top 3 documents after rerank:
== EcoTech Innovations (Relevance: 0.0001)
== GreenEnergy Systems (Relevance: 0.0001)
== QuantumComputing Inc (Relevance: 0.0000)
Final answer:
I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided.
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector.
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector.
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can tell you about some companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can provide you with some companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I can provide information on companies that have been recommended as "Buy" investments in the documents provided.
## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to a diverse range of businesses. In 2023, CISY demonstrated strong financial performance and product innovation, positioning it well in the competitive cloud market.
## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported robust financial results, innovative product developments, and strategic partnerships, making it a solid investment choice for those with a long-term investment horizon.
## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023 and 2024, BTCI demonstrated solid financial growth, product innovations, and an improved market position, making it an attractive investment opportunity for long-term growth.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided.
## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector.
## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector.
## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry.
Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.