Semantic Search with Embeddings
This section provides examples on how to use the Embed endpoint to perform semantic search.
Semantic search solves the problem faced by the more traditional approach of lexical search, which is great at finding keyword matches, but struggles to capture the context or meaning of a piece of text.
The Embed endpoint takes in texts as input and returns embeddings as output.
For semantic search, there are two types of documents we need to turn into embeddings.
- The list of documents to search from.
- The query that will be used to search the documents.
Step 1: Embed the documents
We call the Embed endpoint using co.embed()
and pass the required arguments:
texts
: The list of textsmodel
: Here we chooseembed-v4.0
input_type
: We choosesearch_document
to ensure the model treats these as the documents for searchembedding_types
: We choosefloat
to get a float array as the output
Step 2: Embed the query
Next, we add and embed a query. We choose search_query
as the input_type
to ensure the model treats this as the query (instead of documents) for search.
Step 3: Return the most similar documents
Next, we calculate and sort similarity scores between a query and document embeddings, then display the top N most similar documents. Here, we are using the numpy library for calculating similarity using a dot product approach.
Here’s an example output:
Content quality measure with Embed v4
A standard text embeddings model is optimized for only topic similarity between a query and candidate documents. But in many real-world applications, you have redundant information with varying content quality.
For instance, consider a user query of “COVID-19 Symptoms” and compare that to candidate document, “COVID-19 has many symptoms”. This document does not offer high-quality and rich information. However, with a typical embedding model, it will appear high on search results because it is highly similar to the query.
The Embed v4 model is trained to capture both content quality and topic similarity. Through this approach, a search system can extract richer information from documents and is robust against noise.
As an example below, give a query (“COVID-19 Symptoms”), the document with the highest quality (“COVID-19 symptoms can include: a high temperature or shivering…”) is ranked first.
Another document (“COVID-19 has many symptoms”) is arguably more similar to the query based on what information it contains, yet it is ranked lower as it doesn’t contain that much information.
This demonstrates how Embed v4 helps to surface high-quality documents for a given query.
Here’s a sample output:
Multilingual semantic search
The Embed endpoint also supports multilingual semantic search via embed-v4.0
and previous embed-multilingual-...
models. This means you can perform semantic search on texts in different languages.
Specifically, you can do both multilingual and cross-lingual searches using one single model.
Specifically, you can do both multilingual and cross-lingual searches using one single model.
Here’s a sample output:
Multimodal PDF search
Handling PDF files, which often contain a mix of text, images, and layout information, presents a challenge for traditional embedding methods. This usually requires a multimodal generative model to pre-process the documents into a format that is suitable for the embedding model. This intermediate text representations can lose critical information; for example, the structure and precise content of tables or complex layouts might not be accurately rendered
Embed v4 solves this problem as it is designed to natively understand mixed-modality inputs. Embed v4 can directly process the PDF content, including text and images, in a single step. It generates a unified embedding that captures the semantic meaning derived from both the textual and visual elements.
Here’s an example of how to use the Embed endpoint to perform multimodal PDF search.
First, import the required libraries.
Next, turn a PDF file into a list of images, with one image per page. Then format these images into the content structure expected by the Embed endpoint.
Next, generate the embeddings for these pages and store them in a vector database (in this example, we use Chroma).
Finally, provide a query and run a search over the documents. This will return a list of sorted IDs representing the most similar pages to the query.
The top-ranked page is shown below:

For a more complete example of multimodal PDF search, see the cookbook version.