Text embeddings are lists of numbers that represent the context or meaning inside a piece of text. This is particularly useful in search or information retrieval applications. With text embeddings, this is called semantic search.

Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.

With Cohere, you can generate text embeddings through the Embed endpoint.

In this tutorial, you’ll learn about:

Embedding the documents
Embedding the query
Performing semantic search
Multilingual semantic search
Changing embedding compression types

You’ll learn these by building an onboarding assistant for new hires.

Setup

To get started, first we need to install the cohere library and create a Cohere client.

PYTHON

1 # pip install cohere
2 
3 import cohere
4 import numpy as np
5 
6 # Get your free API key: https://dashboard.cohere.com/api-keys
7 co = cohere.ClientV2(api_key="COHERE_API_KEY")

Embedding the documents

The Embed endpoint takes in texts as input and returns embeddings as output.

For semantic search, there are two types of documents we need to turn into embeddings.

The list of documents that we want to search from.
The query that will be used to search the documents.

Right now, we are doing the former. We call the Embed endpoint using co.embed() and pass the following arguments:

model: Here we choose embed-v4.0
input_type: We choose search_document to ensure the model treats these as the documents for search
texts: The list of texts (the FAQs)
embedding_types: We choose float to get the float embeddings.

PYTHON

1 # Define the documents
2 faqs_long = [
3     {
4         "text": "Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged."
5     },
6     {
7         "text": "Finding Coffee Spots: For your caffeine fix, head to the break room's coffee machine or cross the street to the café for artisan coffee."
8     },
9     {
10         "text": "Team-Building Activities: We foster team spirit with monthly outings and weekly game nights. Feel free to suggest new activity ideas anytime!"
11     },
12     {
13         "text": "Working Hours Flexibility: We prioritize work-life balance. While our core hours are 9 AM to 5 PM, we offer flexibility to adjust as needed."
14     },
15     {
16         "text": "Side Projects Policy: We encourage you to pursue your passions. Just be mindful of any potential conflicts of interest with our business."
17     },
18     {
19         "text": "Reimbursing Travel Expenses: Easily manage your travel expenses by submitting them through our finance tool. Approvals are prompt and straightforward."
20     },
21     {
22         "text": "Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours."
23     },
24     {
25         "text": "Health and Wellness Benefits: We care about your well-being and offer gym memberships, on-site yoga classes, and comprehensive health insurance."
26     },
27     {
28         "text": "Performance Reviews Frequency: We conduct informal check-ins every quarter and formal performance reviews twice a year."
29     },
30     {
31         "text": "Proposing New Ideas: Innovation is welcomed! Share your brilliant ideas at our weekly team meetings or directly with your team lead."
32     },
33 ]
34 
35 # Embed the documents
36 doc_emb = co.embed(
37     model="embed-v4.0",
38     input_type="search_document",
39     texts=[doc["text"] for doc in documents],
40 ).embeddings

Embedding the query

Next, we add a query, which asks about how to stay connected to company updates.

We choose search_query as the input_type to ensure the model treats this as the query (instead of documents) for search.

PYTHON

1 # Add the user query
2 query = "How do I stay connected to what's happening at the company?"
3 
4 # Embed the query
5 query_emb = co.embed(
6     model="embed-v4.0",
7     input_type="search_query",
8     texts=[query],
9 ).embeddings

Perfoming semantic search

Now, we want to search for the most relevant documents to the query. We do this by computing the similarity between the embeddings of the query and each of the documents.

There are various approaches to compute similarity between embeddings, and we’ll choose the dot product approach. For this, we use the numpy library which comes with the implementation.

Each query-document pair returns a score, which represents how similar the pair is. We then sort these scores in descending order and select the top-most n similar pairs, which we choose to return the top two (n=2, this is an arbitrary choice, you can choose any number).

Here, we show the most relevant documents with their similarity scores.

PYTHON

1 # Compute dot product similarity and display results
2 def return_results(query_emb, doc_emb, documents):
3     n = 2
4     scores = np.dot(query_emb, np.transpose(doc_emb))[0]
5     scores_sorted = sorted(
6         enumerate(scores), key=lambda x: x[1], reverse=True
7     )[:n]
8 
9     for idx, item in enumerate(scores_sorted):
10         print(f"Rank: {idx+1}")
11         print(f"Score: {item[1]}")
12         print(f"Document: {documents[item[0]]}\n")
13 
14 
15 return_results(query_emb, doc_emb, documents)

Rank: 1
Score: 0.352135965228231
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 0.31995661889273097
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

Multilingual semantic search

The Embed endpoint also supports multilingual semantic search via the embed-multilingual-... models. This means you can perform semantic search on texts in different languages.

Specifically, you can do both multilingual and cross-lingual searches using one single model.

Multilingual search happens when the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob’s Burgers.” You can replace English with other languages and use the same model for performing search.

Cross-lingual search happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob’s Burgers.”

In the example below, we repeat the steps of performing semantic search with one difference – changing the model type to the multilingual version. Here, we use the embed-v4.0 model. Here, we are searching a French version of the FAQ list using an English query.

PYTHON

1 # Define the documents
2 faqs_short_fr = [
3     {
4         "text": "Remboursement des frais de voyage : Gérez facilement vos frais de voyage en les soumettant via notre outil financier. Les approbations sont rapides et simples."
5     },
6     {
7         "text": "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."
8     },
9     {
10         "text": "Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète."
11     },
12     {
13         "text": "Fréquence des évaluations de performance : Nous organisons des bilans informels tous les trimestres et des évaluations formelles deux fois par an."
14     },
15 ]
16 
17 documents = faqs_short_fr
18 
19 # Embed the documents
20 doc_emb = co.embed(
21     model="embed-v4.0",
22     input_type="search_document",
23     texts=[doc["text"] for doc in documents],
24 ).embeddings
25 
26 # Add the user query
27 query = "What's your remote-working policy?"
28 
29 # Embed the query
30 query_emb = co.embed(
31     model="embed-v4.0",
32     input_type="search_query",
33     texts=[query],
34 ).embeddings
35 
36 # Compute dot product similarity and display results
37 return_results(query_emb, doc_emb, documents)

Rank: 1
Score: 0.442758615743984
Document: {'text': "Travailler de l'étranger : Il est possible de travailler à distance depuis un autre pays. Il suffit de coordonner avec votre responsable et de vous assurer d'être disponible pendant les heures de travail."}
Rank: 2
Score: 0.32783563708365726
Document: {'text': 'Avantages pour la santé et le bien-être : Nous nous soucions de votre bien-être et proposons des adhésions à des salles de sport, des cours de yoga sur site et une assurance santé complète.'}

Changing embedding compression types

Semantic search over large datasets can require a lot of memory, which is expensive to host in a vector database. Changing the embeddings compression type can help reduce the memory footprint.

A typical embedding model generates embeddings as float32 format (consuming 4 bytes). By compressing the embeddings to int8 format (1 byte), we can reduce the memory 4x while keeping 99.99% of the original search quality.

We can go even further and use the binary format (1 bit), which reduces the needed memory 32x while keeping 90-98% of the original search quality.

The Embed endpoint supports the following formats: float, int8, unint8, binary, and ubinary. You can get these different compression levels by passing the embedding_types parameter.

In the example below, we embed the documents in two formats: float and int8.

PYTHON

1 # Define the documents
2 documents = faqs_long
3 
4 # Embed the documents with the given embedding types
5 doc_emb = co.embed(
6     model="embed-v4.0",
7     embedding_types=["float", "int8"],
8     input_type="search_document",
9     texts=[doc["text"] for doc in documents],
10 ).embeddings
11 
12 # Add the user query
13 query = "How do I stay connected to what's happening at the company?"
14 
15 # Embed the query
16 query_emb = co.embed(
17     model="embed-v4.0",
18     embedding_types=["float", "int8"],
19     input_type="search_query",
20     texts=[query],
21 ).embeddings

Here are the search results of using the float embeddings (same as the earlier example).

PYTHON

1 # Compute dot product similarity and display results
2 return_results(query_emb.float, doc_emb.float, faqs_long)

Rank: 1
Score: 0.352135965228231
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 0.31995661889273097
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

And here are the search results of using the int8 embeddings.

PYTHON

1 # Compute dot product similarity and display results
2 return_results(query_emb.int8, doc_emb.int8, documents)

Rank: 1
Score: 563583
Document: {'text': 'Joining Slack Channels: You will receive an invite via email. Be sure to join relevant channels to stay informed and engaged.'}
Rank: 2
Score: 508692
Document: {'text': 'Working from Abroad: Working remotely from another country is possible. Simply coordinate with your manager and ensure your availability during core hours.'}

Conclusion

In this tutorial, you learned about:

How to embed documents for search
How to embed queries
How to perform semantic search
How to perform multilingual semantic search
How to change the embedding compression types

A high-performance and modern search system typically includes a reranking stage, which further boosts the search results.

In Part 5, you will learn how to add reranking to a search system.

Semantic Search with Cohere Models

Setup

Embedding the documents

Embedding the query

Perfoming semantic search

Multilingual semantic search

Further reading

Changing embedding compression types

Further reading:

Conclusion