Semantic Search with Cohere
Text embeddings are lists of numbers that represent the context or meaning inside a piece of text. This is particularly useful in search or information retrieval applications. With text embeddings, this is called semantic search.
Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.
With Cohere, you can generate text embeddings through the Embed endpoint (Embed v3 being the latest model), which supports over 100 languages.
In this tutorial, you’ll learn about:
- Embedding the documents
- Embedding the query
- Performing semantic search
- Multilingual semantic search
- Changing embedding compression types
You’ll learn these by building an onboarding assistant for new hires.
Setup
To get started, first we need to install the cohere
library and create a Cohere client.
Embedding the documents
The Embed endpoint takes in texts as input and returns embeddings as output.
For semantic search, there are two types of documents we need to turn into embeddings.
- The list of documents that we want to search from.
- The query that will be used to search the documents.
Right now, we are doing the former. We call the Embed endpoint using co.embed()
and pass the following arguments:
model
: Here we chooseembed-english-v3.0
, which generates embeddings of size 1024input_type
: We choosesearch_document
to ensure the model treats these as the documents for searchtexts
: The list of texts (the FAQs)embedding_types
: We choosefloat
to get the float embeddings.
Further reading:
- Embed endpoint API reference
- Documentation on the Embed endpoint
- Documentation on the models available on the Embed endpoint
- LLM University module on Text Representation
Embedding the query
Next, we add a query, which asks about how to stay connected to company updates.
We choose search_query
as the input_type
to ensure the model treats this as the query (instead of documents) for search.
Perfoming semantic search
Now, we want to search for the most relevant documents to the query. We do this by computing the similarity between the embeddings of the query and each of the documents.
There are various approaches to compute similarity between embeddings, and we’ll choose the dot product approach. For this, we use the numpy
library which comes with the implementation.
Each query-document pair returns a score, which represents how similar the pair is. We then sort these scores in descending order and select the top-most similar pairs, which we choose 2 (this is an arbitrary choice, you can choose any number).
Here, we show the most relevant documents with their similarity scores.
Multilingual semantic search
The Embed endpoint also supports multilingual semantic search via the embed-multilingual-...
models. This means you can perform semantic search on texts in different languages.
Specifically, you can do both multilingual and cross-lingual searches using one single model.
Multilingual search happens when the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob’s Burgers.” You can replace English with other languages and use the same model for performing search.
Cross-lingual search happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob’s Burgers.”
In the example below, we repeat the steps of performing semantic search with one difference – changing the model type to the multilingual version. Here, we use the embed-multilingual-v3.0
model. Here, we are searching a French version of the FAQ list using an English query.
Further reading:
Changing embedding compression types
Semantic search over large datasets can require a lot of memory, which is expensive to host in a vector database. Changing the embeddings compression type can help reduce the memory footprint.
A typical embedding model generates embeddings as float32 format (consuming 4 bytes). By compressing the embeddings to int8 format (1 byte), we can reduce the memory 4x while keeping 99.99% of the original search quality.
We can go even further and use the binary format (1 bit), which reduces the needed memory 32x while keeping 90-98% of the original search quality.
The Embed endpoint supports the following formats: float
, int8
, unint8
, binary
, and ubinary
. You can get these different compression levels by passing the embedding_types
parameter.
In the example below, we embed the documents in two formats: float
and int8
.
Here are the search results of using the float
embeddings (same as the earlier example).
And here are the search results of using the int8
embeddings.
Further reading:
Conclusion
In this tutorial, you learned about:
- How to embed documents for search
- How to embed queries
- How to perform semantic search
- How to perform multilingual semantic search
- How to change the embedding compression types
A high-performance and modern search system typically includes a reranking stage, which further boosts the search results.
In Part 5, you will learn how to add reranking to a search system.