Text embeddings are lists of numbers that represent the context or meaning inside a piece of text. This is particularly useful in search or information retrieval applications. With text embeddings, this is called semantic search.
Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.
With Cohere, you can generate text embeddings through the Embed endpoint.
In this tutorial, you’ll learn about:
You’ll learn these by building an onboarding assistant for new hires.
To get started, first we need to install the cohere library and create a Cohere client.
The Embed endpoint takes in texts as input and returns embeddings as output.
For semantic search, there are two types of documents we need to turn into embeddings.
Right now, we are doing the former. We call the Embed endpoint using co.embed() and pass the following arguments:
model: Here we choose embed-v4.0input_type: We choose search_document to ensure the model treats these as the documents for searchtexts: The list of texts (the FAQs)embedding_types: We choose float to get the float embeddings.Further reading:
Next, we add a query, which asks about how to stay connected to company updates.
We choose search_query as the input_type to ensure the model treats this as the query (instead of documents) for search.
Now, we want to search for the most relevant documents to the query. We do this by computing the similarity between the embeddings of the query and each of the documents.
There are various approaches to compute similarity between embeddings, and we’ll choose the dot product approach. For this, we use the numpy library which comes with the implementation.
Each query-document pair returns a score, which represents how similar the pair is. We then sort these scores in descending order and select the top-most n similar pairs, which we choose to return the top two (n=2, this is an arbitrary choice, you can choose any number).
Here, we show the most relevant documents with their similarity scores.
The Embed endpoint also supports multilingual semantic search via the embed-multilingual-... models. This means you can perform semantic search on texts in different languages.
Specifically, you can do both multilingual and cross-lingual searches using one single model.
Multilingual search happens when the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob’s Burgers.” You can replace English with other languages and use the same model for performing search.
Cross-lingual search happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob’s Burgers.”
In the example below, we repeat the steps of performing semantic search with one difference – changing the model type to the multilingual version. Here, we use the embed-v4.0 model. Here, we are searching a French version of the FAQ list using an English query.
Semantic search over large datasets can require a lot of memory, which is expensive to host in a vector database. Changing the embeddings compression type can help reduce the memory footprint.
A typical embedding model generates embeddings as float32 format (consuming 4 bytes). By compressing the embeddings to int8 format (1 byte), we can reduce the memory 4x while keeping 99.99% of the original search quality.
We can go even further and use the binary format (1 bit), which reduces the needed memory 32x while keeping 90-98% of the original search quality.
The Embed endpoint supports the following formats: float, int8, unint8, binary, and ubinary. You can get these different compression levels by passing the embedding_types parameter.
In the example below, we embed the documents in two formats: float and int8.
Here are the search results of using the float embeddings (same as the earlier example).
And here are the search results of using the int8 embeddings.
In this tutorial, you learned about:
A high-performance and modern search system typically includes a reranking stage, which further boosts the search results.
In Part 5, you will learn how to add reranking to a search system.