Semantic Search with Cohere

Open in Colab

Text embeddings are lists of numbers that represent the context or meaning inside a piece of text. This is particularly useful in search or information retrieval applications. With text embeddings, this is called semantic search.

Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.

With Cohere, you can generate text embeddings through the Embed endpoint (Embed v3 being the latest model), which supports over 100 languages.

In this tutorial, you’ll learn about:

  • Embedding the documents
  • Embedding the query
  • Performing semantic search
  • Multilingual semantic search
  • Changing embedding compression types

You’ll learn these by building an onboarding assistant for new hires.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

PYTHON
1# pip install cohere
2
3import numpy as np
4import cohere
5
6co = cohere.Client("COHERE_API_KEY") # Get your API key: https://dashboard.cohere.com/api-keys

Embedding the documents

The Embed endpoint takes in texts as input and returns embeddings as output.

For semantic search, there are two types of documents we need to turn into embeddings.

  • The list of documents that we want to search from.
  • The query that will be used to search the documents.

Right now, we are doing the former. We call the Embed endpoint using co.embed() and pass the following arguments:

  • model: Here we choose embed-english-v3.0, which generates embeddings of size 1024
  • input_type: We choose search_document to ensure the model treats these as the documents for search
  • texts: The list of texts (the FAQs)
PYTHON
1# Define the documents
2faqs_long = [
3 {"text": "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged."},
4 {"text": "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."},
5 {"text": "Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!"},
6 {"text": "Working Hours Flexibility: We prioritize work-life balance. While our core hours are 9 AM to 5 PM, we offer flexibility to adjust as needed."},
7 {"text": "Side Projects Policy: We encourage you to pursue your passions. Just be mindful of any potential conflicts of interest with our business."},
8 {"text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."},
9 {"text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."},
10 {"text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."},
11 {"text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."},
12 {"text": "Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead."},
13]
14
15documents = faqs_long
16
17# Embed the documents
18doc_emb = co.embed(
19 model="embed-english-v3.0",
20 input_type="search_document",
21 texts=[doc['text'] for doc in documents]).embeddings

Further reading:

Embedding the query

Next, we add a query, which asks about how to stay connected to company updates.

We choose search_query as the input_type to ensure the model treats this as the query (instead of documents) for search.

PYTHON
1# Add the user query
2query = "How do I stay connected to what's happening at the company?"
3
4# Embed the query
5query_emb = co.embed(
6 model="embed-english-v3.0",
7 input_type="search_query",
8 texts=[query]).embeddings

Now, we want to search for the most relevant documents to the query. We do this by computing the similarity between the embeddings of the query and each of the documents.

There are various approaches to compute similarity between embeddings, and we’ll choose the dot product approach. For this, we use the numpy library which comes with the implementation.

Each query-document pair returns a score, which represents how similar the pair is. We then sort these scores in descending order and select the top-most similar pairs, which we choose 2 (this is an arbitrary choice, you can choose any number).

Here, we show the most relevant documents with their similarity scores.

PYTHON
1# Compute dot product similarity and display results
2def return_results(query_emb, doc_emb, documents):
3 n = 2
4 scores = np.dot(query_emb, np.transpose(doc_emb))[0]
5 scores_sorted = sorted(enumerate(scores),
6 key=lambda x: x[1],
7 reverse=True)[:n]
8
9 for idx, item in enumerate(scores_sorted):
10 print(f"Rank: {idx+1}")
11 print(f"Score: {item[1]}")
12 print(f"Document: {documents[item[0]]}\n")
13
14return_results(query_emb, doc_emb, documents)
Rank: 1
Score: 0.352135965228231
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 0.31995661889273097
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

The Embed endpoint also supports multilingual semantic search via the embed-multilingual-... models. This means you can perform semantic search on texts in different languages.

Specifically, you can do both multilingual and cross-lingual searches using one single model.

Multilingual search happens when the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob’s Burgers.” You can replace English with other languages and use the same model for performing search.

Cross-lingual search happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob’s Burgers.”

In the example below, we repeat the steps of performing semantic search with one difference – changing the model type to the multilingual version. Here, we use the embed-multilingual-v3.0 model. Here, we are searching a French version of the FAQ list using an English query.

PYTHON
1# Define the documents
2faqs_short_fr = [
3 {"text" : "Remboursement des frais de voyage : Gérez facilement vos frais de voyage en les soumettant via notre outil financier. Les approbations sont rapides et simples."},
4 {"text" : "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."},
5 {"text" : "Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète."},
6 {"text" : "Fréquence des évaluations de performance : Nous organisons des bilans informels tous les trimestres et des évaluations formelles deux fois par an."}
7]
8
9documents = faqs_short_fr
10
11# Embed the documents
12doc_emb = co.embed(
13 model="embed-multilingual-v3.0",
14 input_type="search_document",
15 texts=[doc['text'] for doc in documents]).embeddings
16
17# Add the user query
18query = "What's your remote-working policy?"
19
20# Embed the query
21query_emb = co.embed(
22 model="embed-multilingual-v3.0",
23 input_type="search_query",
24 texts=[query]).embeddings
25
26# Compute dot product similarity and display results
27return_results(query_emb, doc_emb, documents)
Rank: 1
Score: 0.442758615743984
Document: {'text': "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."}
Rank: 2
Score: 0.32783563708365726
Document: {'text': 'Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète.'}

Further reading:

Changing embedding compression types

Semantic search over large datasets can require a lot of memory, which is expensive to host in a vector database. Changing the embeddings compression type can help reduce the memory footprint.

A typical embedding model generates embeddings as float32 format (consuming 4 bytes). By compressing the embeddings to int8 format (1 byte), we can reduce the memory 4x while keeping 99.99% of the original search quality.

We can go even further and use the binary format (1 bit), which reduces the needed memory 32x while keeping 90-98% of the original search quality.

The Embed endpoint supports the following formats: float, int8, unint8, binary, and ubinary. You can get these different compression levels by passing the embedding_types parameter.

In the example below, we embed the documents in two formats: float and int8.

PYTHON
1# Define the documents
2documents = faqs_long
3
4# Embed the documents with the given embedding types
5doc_emb = co.embed(
6 model="embed-english-v3.0",
7 embedding_types=["float","int8"],
8 input_type="search_document",
9 texts=[doc['text'] for doc in documents]).embeddings
10
11# Add the user query
12query = "How do I stay connected to what's happening at the company?"
13
14# Embed the query
15query_emb = co.embed(
16 model="embed-english-v3.0",
17 embedding_types=["float","int8"],
18 input_type="search_query",
19 texts=[query]).embeddings

Here are the search results of using the float embeddings.

PYTHON
1# Compute dot product similarity and display results
2return_results(query_emb.float_, doc_emb.float_, documents)
Rank: 1
Score: 0.352135965228231
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 0.31995661889273097
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

And here are the search results of using the int8 embeddings.

PYTHON
1# Compute dot product similarity and display results
2return_results(query_emb.int8, doc_emb.int8, documents)
Rank: 1
Score: 563583
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 508692
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

Further reading:

Conclusion

In this tutorial, you learned about:

  • How to embed documents for search
  • How to embed queries
  • How to perform semantic search
  • How to perform multilingual semantic search
  • How to change the embedding compression types

A high-performance and modern search system typically includes a reranking stage, which further boosts the search results.

In Part 5, you will learn how to add reranking to a search system.