Weaviate and Cohere (Integration Guide)

Weaviate is an open source vector search engine that stores both objects and vectors, allowing for combining vector search with structured filtering. Here, we’ll create a Weaviate Cluster to index your data with Cohere Embed, and process it with Rerank and Command.

Here are the steps involved:

  • Create the Weaviate cluster (see this post for more detail.)
  • Once the cluster is created, you will receive the cluster URL and API key.
  • Use the provided URL and API key to connect to your Weaviate cluster.
  • Use the Weaviate Python client to create your collection to store data

Getting Set up

First, let’s handle the imports, the URLs, and the pip installs.

PYTHON
1from google.colab import userdata
2
3weaviate_url = userdata.get("WEAVIATE_ENDPOINT")
4weaviate_key = userdata.get("WEAVIATE_API_KEY")
5cohere_key = userdata.get("COHERE_API_KEY")
PYTHON
1!pip install -U weaviate-client -q
PYTHON
1# Import the weaviate modules to interact with the Weaviate vector database
2import weaviate
3from weaviate.classes.init import Auth
4
5# Define headers for the API requests, including the Cohere API key
6headers = {
7 "X-Cohere-Api-Key": cohere_key,
8}
9
10# Connect to the Weaviate cloud instance
11client = weaviate.connect_to_weaviate_cloud(
12 cluster_url=weaviate_url, # `weaviate_url`: your Weaviate URL
13 auth_credentials=Auth.api_key(
14 weaviate_key
15 ), # `weaviate_key`: your Weaviate API key
16 headers=headers,
17)

Embed

Now, we’ll create a new collection named "Healthcare_Compliance" in the Weaviate database.

PYTHON
1from weaviate.classes.config import Configure
2
3# This is where the "Healthcare_Compliance" collection is created in Weaviate.
4client.collections.create(
5 "Healthcare_Compliance",
6 vectorizer_config=[
7 # Configure a named vectorizer using Cohere's model
8 Configure.NamedVectors.text2vec_cohere(
9 name="title_vector", # Name of the vectorizer
10 source_properties=[
11 "title"
12 ], # Property to vectorize (in this case, the "title" field)
13 model="embed-english-v3.0", # Cohere model to use for vectorization
14 )
15 ],
16)

You’ll see something like this:

PYTHON
1<weaviate.collections.collection.sync.Collection at 0x7f48a5604590>

Next, we’ll define the list of healthcare compliance documents, retrieve the "Healthcare_Compliance" collection from the Weaviate client, and use a dynamic batch process to add multiple documents to the collection efficiently.

PYTHON
1# Define the list of healthcare compliance documents
2
3hl_compliance_docs = [
4 {
5 "title": "HIPAA Compliance Guide",
6 "description": "Comprehensive overview of HIPAA regulations, including patient privacy rules, data security standards, and breach notification requirements.",
7 },
8 {
9 "title": "FDA Drug Approval Process",
10 "description": "Detailed explanation of the FDA's drug approval process, covering clinical trials, safety reviews, and post-market surveillance.",
11 },
12 {
13 "title": "Telemedicine Regulations",
14 "description": "Analysis of state and federal regulations governing telemedicine practices, including licensing, reimbursement, and patient consent.",
15 },
16 {
17 "title": "Healthcare Data Security",
18 "description": "Best practices for securing healthcare data, including encryption, access controls, and incident response planning.",
19 },
20 {
21 "title": "Medicare and Medicaid Billing",
22 "description": "Guide to billing and reimbursement processes for Medicare and Medicaid, including coding, claims submission, and audit compliance.",
23 },
24 {
25 "title": "Patient Rights and Consent",
26 "description": "Overview of patient rights under federal and state laws, including informed consent, access to medical records, and end-of-life decisions.",
27 },
28 {
29 "title": "Healthcare Fraud and Abuse",
30 "description": "Explanation of laws and regulations related to healthcare fraud, including the False Claims Act, Anti-Kickback Statute, and Stark Law.",
31 },
32 {
33 "title": "Occupational Safety in Healthcare",
34 "description": "Guidelines for ensuring workplace safety in healthcare settings, including infection control, hazard communication, and emergency preparedness.",
35 },
36 {
37 "title": "Health Insurance Portability",
38 "description": "Discussion of COBRA and other laws ensuring continuity of health insurance coverage during job transitions or life events.",
39 },
40 {
41 "title": "Medical Device Regulations",
42 "description": "Overview of FDA regulations for medical devices, including classification, premarket approval, and post-market surveillance.",
43 },
44 {
45 "title": "Electronic Health Records (EHR) Standards",
46 "description": "Explanation of standards and regulations for EHR systems, including interoperability, data exchange, and patient privacy.",
47 },
48 {
49 "title": "Pharmacy Regulations",
50 "description": "Overview of state and federal regulations governing pharmacy practices, including prescription drug monitoring, compounding, and controlled substances.",
51 },
52 {
53 "title": "Mental Health Parity Act",
54 "description": "Analysis of the Mental Health Parity and Addiction Equity Act, ensuring equal coverage for mental health and substance use disorder treatment.",
55 },
56 {
57 "title": "Healthcare Quality Reporting",
58 "description": "Guide to quality reporting requirements for healthcare providers, including measures, submission processes, and performance benchmarks.",
59 },
60 {
61 "title": "Advance Directives and End-of-Life Care",
62 "description": "Overview of laws and regulations governing advance directives, living wills, and end-of-life care decisions.",
63 },
64]
65
66# Retrieve the "Healthcare_Compliance" collection from the Weaviate client
67collection = client.collections.get("Healthcare_Compliance")
68
69# Use a dynamic batch process to add multiple documents to the collection efficiently
70with collection.batch.dynamic() as batch:
71 for src_obj in hl_compliance_docs:
72 # Add each document to the batch, specifying the "title" and "description" properties
73 batch.add_object(
74 properties={
75 "title": src_obj["title"],
76 "description": src_obj["description"],
77 },
78 )

Now, we’ll iterate over the objects we’ve retrieved and print their results:

PYTHON
1# Import the MetadataQuery class from weaviate.classes.query to handle metadata in queries
2from weaviate.classes.query import MetadataQuery
3
4# Retrieve the "Healthcare_Compliance" collection from the Weaviate client
5collection = client.collections.get("Healthcare_Compliance")
6
7# Perform a near_text search for documents related to "policies related to drug compounding"
8response = collection.query.near_text(
9 query="policies related to drug compounding", # Search query
10 limit=2, # Limit the number of results to 2
11 return_metadata=MetadataQuery(
12 distance=True
13 ), # Include distance metadata in the results
14)
15
16# Iterate over the retrieved objects and print their details
17for obj in response.objects:
18 title = obj.properties.get("title")
19 description = obj.properties.get("description")
20 distance = (
21 obj.metadata.distance
22 ) # Get the distance metadata (A lower value for a distance means that two vectors are closer to one another than a higher value)
23 print(f"Title: {title}")
24 print(f"Description: {description}")
25 print(f"Distance: {distance}")
26 print("-" * 50)

The output will look something like this (NOTE: a lower value for a Distance means that two vectors are closer to one another than those with a higher value):

PYTHON
1Title: Pharmacy Regulations
2Description: Overview of state and federal regulations governing pharmacy practices, including prescription drug monitoring, compounding, and controlled substances.
3Distance: 0.5904817581176758
4--------------------------------------------------
5Title: FDA Drug Approval Process
6Description: Detailed explanation of the FDA's drug approval process, covering clinical trials, safety reviews, and post-market surveillance.
7Distance: 0.6262975931167603
8--------------------------------------------------

Embed + Rerank

Now, we’ll add in Cohere Rerank to surface more relevant results. This will involve some more set up:

PYTHON
1# Import the weaviate module to interact with the Weaviate vector database
2import weaviate
3from weaviate.classes.init import Auth
4
5# Define headers for the API requests, including the Cohere API key
6headers = {
7 "X-Cohere-Api-Key": cohere_key,
8}
9
10# Connect to the Weaviate cloud instance
11client = weaviate.connect_to_weaviate_cloud(
12 cluster_url=weaviate_url, # `weaviate_url`: your Weaviate URL
13 auth_credentials=Auth.api_key(
14 weaviate_key
15 ), # `weaviate_key`: your Weaviate API key
16 headers=headers, # Include the Cohere API key in the headers
17)

And here we’ll create a "Legal_Docs" collection in the Weaviate database:

PYTHON
1from weaviate.classes.config import Configure, Property, DataType
2
3# Create a new collection named "Legal_Docs" in the Weaviate database
4client.collections.create(
5 name="Legal_Docs",
6 properties=[
7 # Define a property named "title" with data type TEXT
8 Property(name="title", data_type=DataType.TEXT),
9 ],
10 # Configure the vectorizer to use Cohere's text2vec model
11 vectorizer_config=Configure.Vectorizer.text2vec_cohere(
12 model="embed-english-v3.0" # Specify the Cohere model to use for vectorization
13 ),
14 # Configure the reranker to use Cohere's rerank model
15 reranker_config=Configure.Reranker.cohere(
16 model="rerank-english-v3.0" # Specify the Cohere model to use for reranking
17 ),
18)
PYTHON
1legal_documents = [
2 {
3 "title": "Contract Law Basics",
4 "description": "An in-depth introduction to contract law, covering essential elements such as offer, acceptance, consideration, and mutual assent. Explores types of contracts, including express, implied, and unilateral contracts, as well as remedies for breach of contract, such as damages, specific performance, and rescission.",
5 },
6 {
7 "title": "Intellectual Property Rights",
8 "description": "Comprehensive overview of intellectual property laws, including patents, trademarks, copyrights, and trade secrets. Discusses the process of obtaining patents, trademark registration, and copyright protection, as well as strategies for enforcing intellectual property rights and defending against infringement claims.",
9 },
10 {
11 "title": "Employment Law Guide",
12 "description": "Detailed guide to employment laws, covering hiring practices, termination procedures, anti-discrimination laws, and workplace safety regulations. Includes information on employee rights, such as minimum wage, overtime pay, and family and medical leave, as well as employer obligations under federal and state laws.",
13 },
14 {
15 "title": "Criminal Law Procedures",
16 "description": "Step-by-step explanation of criminal law procedures, from arrest and booking to trial and sentencing. Covers the rights of the accused, including the right to counsel, the right to remain silent, and the right to a fair trial, as well as rules of evidence and burden of proof in criminal cases.",
17 },
18 {
19 "title": "Real Estate Transactions",
20 "description": "Comprehensive guide to real estate transactions, including purchase agreements, title searches, property inspections, and closing processes. Discusses common issues such as title defects, financing contingencies, and property disclosures, as well as the role of real estate agents and attorneys in the transaction process.",
21 },
22 {
23 "title": "Corporate Governance",
24 "description": "In-depth overview of corporate governance principles, including the roles and responsibilities of boards of directors, shareholder rights, and compliance with securities laws. Explores best practices for board composition, executive compensation, and risk management, as well as strategies for maintaining transparency and accountability in corporate decision-making.",
25 },
26 {
27 "title": "Family Law Overview",
28 "description": "Comprehensive introduction to family law, covering marriage, divorce, child custody, child support, and adoption processes. Discusses the legal requirements for marriage and divorce, factors considered in child custody determinations, and the rights and obligations of adoptive parents under state and federal laws.",
29 },
30 {
31 "title": "Tax Law for Businesses",
32 "description": "Detailed guide to tax laws affecting businesses, including corporate income tax, payroll taxes, sales and use taxes, and tax deductions. Explores tax planning strategies, such as deferring income and accelerating expenses, as well as compliance requirements and penalties for non-compliance with tax laws.",
33 },
34 {
35 "title": "Immigration Law Basics",
36 "description": "Comprehensive overview of immigration laws, including visa categories, citizenship requirements, and deportation processes. Discusses the rights and obligations of immigrants, including access to public benefits and protection from discrimination, as well as the role of immigration attorneys in navigating the immigration system.",
37 },
38 {
39 "title": "Environmental Regulations",
40 "description": "In-depth overview of environmental laws and regulations, including air and water quality standards, hazardous waste management, and endangered species protection. Explores the role of federal and state agencies in enforcing environmental laws, as well as strategies for businesses to achieve compliance and minimize environmental impact.",
41 },
42 {
43 "title": "Consumer Protection Laws",
44 "description": "Comprehensive guide to consumer protection laws, including truth in advertising, product safety, and debt collection practices. Discusses the rights of consumers under federal and state laws, such as the right to sue for damages and the right to cancel certain contracts, as well as the role of government agencies in enforcing consumer protection laws.",
45 },
46 {
47 "title": "Estate Planning Essentials",
48 "description": "Detailed overview of estate planning, including wills, trusts, powers of attorney, and advance healthcare directives. Explores strategies for minimizing estate taxes, protecting assets from creditors, and ensuring that assets are distributed according to the individual's wishes after death.",
49 },
50 {
51 "title": "Bankruptcy Law Overview",
52 "description": "Comprehensive introduction to bankruptcy law, including Chapter 7 and Chapter 13 bankruptcy proceedings. Discusses the eligibility requirements for filing bankruptcy, the process of liquidating assets and discharging debts, and the impact of bankruptcy on credit scores and future financial opportunities.",
53 },
54 {
55 "title": "International Trade Law",
56 "description": "In-depth overview of international trade laws, including tariffs, quotas, and trade agreements. Explores the role of international organizations such as the World Trade Organization (WTO) in regulating global trade, as well as strategies for businesses to navigate trade barriers and comply with international trade regulations.",
57 },
58 {
59 "title": "Healthcare Law and Regulations",
60 "description": "Comprehensive guide to healthcare laws and regulations, including patient privacy rights, healthcare provider licensing, and medical malpractice liability. Discusses the impact of laws such as the Affordable Care Act (ACA) and the Health Insurance Portability and Accountability Act (HIPAA) on healthcare providers and patients, as well as strategies for ensuring compliance with healthcare regulations.",
61 },
62]
PYTHON
1# Retrieve the "Legal_Docs" collection from the Weaviate client
2collection = client.collections.get("Legal_Docs")
3
4# Use a dynamic batch process to add multiple documents to the collection efficiently
5with collection.batch.dynamic() as batch:
6 for src_obj in legal_documents:
7 # Add each document to the batch, specifying the "title" and "description" properties
8 batch.add_object(
9 properties={
10 "title": src_obj["title"],
11 "description": src_obj["description"],
12 },
13 )

Now, we’ll need to define a searh query:

PYTHON
1search_query = "eligibility requirements for filing bankruptcy"

This code snippet imports the MetadataQuery class from weaviate.classes.query to handle metadata in queries, iterates over the retrieved objects, and prints their details:

PYTHON
1# Import the MetadataQuery class from weaviate.classes.query to handle metadata in queries
2from weaviate.classes.query import MetadataQuery
3
4# Retrieve the "Legal_Docs" collection from the Weaviate client
5collection = client.collections.get("Legal_Docs")
6
7# Perform a near_text semantic search for documents
8response = collection.query.near_text(
9 query=search_query, # Search query
10 limit=3, # Limit the number of results to 3
11 return_metadata=MetadataQuery(distance=True) # Include distance metadata in the results
12)
13
14print("Semantic Search")
15print("*" * 50)
16
17# Iterate over the retrieved objects and print their details
18for obj in response.objects:
19 title = obj.properties.get("title")
20 description = obj.properties.get("description")
21 metadata_distance = obj.metadata.distance
22 print(f"Title: {title}")
23 print(f"Description: {description}")
24 print(f"Metadata Distance: {metadata_distance}")
25 print("-" * 50)

The output will look something like this:

PYTHON
1Semantic Search
2**************************************************
3Title: Bankruptcy Law Overview
4Description: Comprehensive introduction to bankruptcy law, including Chapter 7 and Chapter 13 bankruptcy proceedings. Discusses the eligibility requirements for filing bankruptcy, the process of liquidating assets and discharging debts, and the impact of bankruptcy on credit scores and future financial opportunities.
5Metadata Distance: 0.41729819774627686
6--------------------------------------------------
7Title: Tax Law for Businesses
8Description: Detailed guide to tax laws affecting businesses, including corporate income tax, payroll taxes, sales and use taxes, and tax deductions. Explores tax planning strategies, such as deferring income and accelerating expenses, as well as compliance requirements and penalties for non-compliance with tax laws.
9Metadata Distance: 0.6903179883956909
10--------------------------------------------------
11Title: Consumer Protection Laws
12Description: Comprehensive guide to consumer protection laws, including truth in advertising, product safety, and debt collection practices. Discusses the rights of consumers under federal and state laws, such as the right to sue for damages and the right to cancel certain contracts, as well as the role of government agencies in enforcing consumer protection laws.
13Metadata Distance: 0.7075160145759583
14--------------------------------------------------

This code sets up Rerank infrastructure:

PYTHON
1# Import the Rerank class from weaviate.classes.query to enable reranking in queries
2from weaviate.classes.query import Rerank
3
4# Perform a near_text search with reranking for documents related to "property contracts and zoning regulations"
5rerank_response = collection.query.near_text(
6 query=search_query,
7 limit=3,
8 rerank=Rerank(
9 prop="description", # Property to rerank based on (description in this case)
10 query=search_query, # Query to use for reranking
11 ),
12)
13
14# Display the reranked search results
15print("Reranked Search Results:")
16for obj in rerank_response.objects:
17 title = obj.properties.get("title")
18 description = obj.properties.get("description")
19 rerank_score = getattr(
20 obj.metadata, "rerank_score", None
21 ) # Get the rerank score metadata
22 print(f"Title: {title}")
23 print(f"Description: {description}")
24 print(f"Rerank Score: {rerank_score}")
25 print("-" * 50)

Here’s what the output looks like:

PYTHON
1Reranked Search Results:
2Title: Bankruptcy Law Overview
3Description: Comprehensive introduction to bankruptcy law, including Chapter 7 and Chapter 13 bankruptcy proceedings. Discusses the eligibility requirements for filing bankruptcy, the process of liquidating assets and discharging debts, and the impact of bankruptcy on credit scores and future financial opportunities.
4Rerank Score: 0.8951567
5--------------------------------------------------
6Title: Tax Law for Businesses
7Description: Detailed guide to tax laws affecting businesses, including corporate income tax, payroll taxes, sales and use taxes, and tax deductions. Explores tax planning strategies, such as deferring income and accelerating expenses, as well as compliance requirements and penalties for non-compliance with tax laws.
8Rerank Score: 7.071895e-06
9--------------------------------------------------
10Title: Consumer Protection Laws
11Description: Comprehensive guide to consumer protection laws, including truth in advertising, product safety, and debt collection practices. Discusses the rights of consumers under federal and state laws, such as the right to sue for damages and the right to cancel certain contracts, as well as the role of government agencies in enforcing consumer protection laws.
12Rerank Score: 6.4895394e-06
13--------------------------------------------------

Based on the rerank scores, it’s clear that the Bankruptcy Law Overview is the most relevant result, while the other two documents (Tax Law for Businesses and Consumer Protection Laws) have significantly lower scores, indicating they are less relevant to the query. Therefore, we should focus only on the most relevant result and can skip the other two.

Embed + Rerank + Command

Finally, we’ll add Command into the mix. This handles imports and creates a fresh "Legal_Docs" in the Weaviate database.

PYTHON
1from weaviate.classes.config import Configure
2from weaviate.classes.generate import GenerativeConfig
3
4# Create a new collection named "Legal_Docs" in the Weaviate database
5client.collections.create(
6 name="Legal_Docs_RAG",
7 properties=[
8 # Define a property named "title" with data type TEXT
9 Property(name="title", data_type=DataType.TEXT),
10 ],
11 # Configure the vectorizer to use Cohere's text2vec model
12 vectorizer_config=Configure.Vectorizer.text2vec_cohere(
13 model="embed-english-v3.0" # Specify the Cohere model to use for vectorization
14 ),
15 # Configure the reranker to use Cohere's rerank model
16 reranker_config=Configure.Reranker.cohere(
17 model="rerank-english-v3.0" # Specify the Cohere model to use for reranking
18 ),
19 # Configure the generative model to use Cohere's command r plus model
20 generative_config=Configure.Generative.cohere(
21 model="command-r-plus"
22 ),
23)

You should see something like that:

PYTHON
1<weaviate.collections.collection.sync.Collection at 0x7f48afc06410>

This retrieves "Legal_Docs_RAG" from Weaviate:

PYTHON
1# Retrieve the "Legal_Docs_RAG" collection from the Weaviate client
2collection = client.collections.get("Legal_Docs_RAG")
3
4# Use a dynamic batch process to add multiple documents to the collection efficiently
5with collection.batch.dynamic() as batch:
6 for src_obj in legal_documents:
7 # Add each document to the batch, specifying the "title" and "description" properties
8 batch.add_object(
9 properties={
10 "title": src_obj["title"],
11 "description": src_obj["description"],
12 },
13 )

As before, we’ll iterate over the object and print the result:

PYTHON
1from weaviate.classes.config import Configure
2from weaviate.classes.generate import GenerativeConfig
3
4# To generate text for each object in the search results, use the single prompt method.
5# The example below generates outputs for each of the n search results, where n is specified by the limit parameter.
6
7collection = client.collections.get("Legal_Docs_RAG")
8response = collection.generate.near_text(
9 query=search_query,
10 limit=1,
11 single_prompt="Translate this into French - {title}: {description}",
12)
13
14for obj in response.objects:
15 print("Retrieved results")
16 print("-----------------")
17 print(obj.properties["title"])
18 print(obj.properties["description"])
19 print("Generated output")
20 print("-----------------")
21 print(obj.generated)

You’ll see something like this:

PYTHON
1Retrieved results
2-----------------
3Bankruptcy Law Overview
4Comprehensive introduction to bankruptcy law, including Chapter 7 and Chapter 13 bankruptcy proceedings. Discusses the eligibility requirements for filing bankruptcy, the process of liquidating assets and discharging debts, and the impact of bankruptcy on credit scores and future financial opportunities.
5Generated output
6-----------------
7Voici une traduction possible :
8
9Aperçu du droit des faillites : Introduction complète au droit des faillites, y compris les procédures de faillite en vertu des chapitres 7 et 13. Discute des conditions d'admissibilité pour déposer une demande de faillite, du processus de liquidation des actifs et de libération des dettes, ainsi que de l'impact de la faillite sur les cotes de crédit et les opportunités financières futures.

Conclusion

This integration guide has demonstrated how to effectively combine Cohere’s powerful AI capabilities with Weaviate’s vector database to create sophisticated search and retrieval systems. We’ve covered three key approaches:

  1. Basic Vector Search: Using Cohere’s Embed model with Weaviate to perform semantic search, enabling natural language queries to find relevant documents based on meaning rather than just keywords.

  2. Enhanced Search with Rerank: Adding Cohere’s Rerank model to improve search results by reordering them based on relevance, ensuring the most pertinent documents appear first.

  3. Full RAG Pipeline: Implementing a complete Retrieval-Augmented Generation (RAG) system that combines embedding, reranking, and Cohere’s Command model to not only find relevant information but also generate contextual responses.

The integration showcases how these technologies work together to create more intelligent and accurate search systems. Whether you’re building a healthcare compliance database, legal document system, or any other knowledge base, this combination provides a powerful foundation for semantic search and AI-powered content generation.

The flexibility of this integration allows you to adapt it to various use cases while maintaining high performance and accuracy in your search and retrieval operations.

Built with