Embed API

📘

This Guide Uses the Embed Endpoint.

You can find the API reference for the endpoint here. If you are interested in producing upwards of 100,000 embeddings we recommend that you use our embed jobs endpoint.

Getting Set Up

First, let's install the SDK (the examples below are in Python, Typescript, and Go):

pip install cohere
npm i -s cohere-ai
go get github.com/cohere-ai/cohere-go/v2

Import dependencies and set up the Cohere client.

import cohere
co = cohere.Client('Your API key')
import { CohereClient } from "cohere-ai";

const cohere = new CohereClient({
    token: "YOUR_API_KEY",
});

(async () => {
    const prediction = await cohere.generate({
        prompt: "hello",
        maxTokens: 10,
    });
    
    console.log("Received prediction", prediction);
})();
import cohereclient "github.com/cohere-ai/cohere-go/v2/client"

client := cohereclient.NewClient(cohereclient.WithToken("<YOUR_AUTH_TOKEN>"))

(All the rest of the examples on this page will be in Python, but you can find more detailed instructions for getting set up by checking out the Github repositories for Python, Typescript, and Go.)

import cohere
import numpy as np
co = cohere.Client('Your API key')

In the code snippet below, we run through a simple example of a Question and Answering (Q&A) scenario where the query is What is the capital of the United States? and the passages are potential answers to the query. We then use cosine similarity to calculate which is the best passage for the query.

# get the query embeddings
query = ["What is the capital of the United States?"]

model="embed-english-v3.0"

# because the text being embedded is the search query, we set the input type as search_query
query_embeddings = co.embed(texts=query,
                model=model,
                input_type="search_query",
                embedding_types=['ubinary'])

# get the passage embeddings
input_type='search_documents'

passages = [
    "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
    "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
    "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."]

# because the texts being embedded are the passages we are searching over, we set the input type as search_doc
doc_embeddings = co.embed(texts=passages,
                model=model,
                input_type="search_document",
                embedding_types=['ubinary'])

# compare them
def calculate_similarity(a, b):
  return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


for idx,r in enumerate(doc_embeddings.embeddings.ubinary):
  print(f"Document: {passages[idx]}")
  print(f"Similarity Score: {calculate_similarity(query_embeddings.embeddings.ubinary,r)}\n")

The output you will see is as follows:

Document: Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.
Similarity Score: [0.84839812]

Document: The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.
Similarity Score: [0.83945703]

Document: Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.
Similarity Score: [0.83129605]

Document: Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.
Similarity Score: [0.90808592]

Document: Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.
Similarity Score: [0.8616596]

For more context, check out our dedicated page on embeddings.

Technical Details of the Embed API

There are a few aspects of the co.embed() endpoint which are worth calling out specifically, as they'll help you utilize and build projects around it more effectively.

The input_type Parameter

With the release of the v3 embeddings models, there is a new mandatory parameter, input_type. It can be one of the following four values:

  • input_type="search_document": Use this when you have texts (documents) that you want to store in a vector database.
  • input_type="search_query": Use this when structuring search queries to find the most relevant documents in your vector database.
  • input_type="classification": Use this if you plan to use the embeddings as an input for a classification system.
  • input_type="clustering": Use this if you plan to use the embeddings for text clustering.

Using the right input type ensures the best possible results. If you want to use the embeddings for multiple use cases, we recommend using input_type="search_document".

What's the Purpose of the Input Type?

Embeddings are a flexible and powerful way to represent data, and for this reason they can serve multiple purposes. Prior versions of the embeddings models usually measure the topic similarity between the query and the document. This is fine if your dataset contains one matching document per topic, but in many real-world applications, you have many documents with overlapping information and varying content quality. Some documents may provide little insight into a topic, for example, while others might be extremely detailed.

Unfortunately, models that measure topic similarity only tend to retrieve the least informative content, which leads to a less than optimal user experience. The new v3 models measure both the topic similarity and content quality in the vector space, which significantly improves the user experience on noisy datasets with varying content quality.

Compression Levels

The Cohere embeddings platform now supports compression. The co.embed() endpoint contains an embeddings_types parameter which allows the user to specify various ways of compressing the output.

The following embedding types are now supported:

  • float
  • int8
  • unint8
  • binary
  • ubinary

The parameter defaults to float, so if you pass in no argument you'll get back float embeddings:

ret = co.embed(texts=phrases,
               model=model,
               input_type=input_type)

ret.embeddings # This contains the float embeddings

Though this works, in accordance with Python best practices we recommend being explicit about the embedding type(s) you would like. To specify an embedding type, pass one of the types from the list above in as list containing a string:

ret = co.embed(texts=phrases,
               model=model,
               input_type=input_type,
               embedding_types=['int8'])

ret.embeddings.int8 # This contains your int8 embeddings
ret.embeddings.float # This will be empty
ret.embeddings.uint8 # This will be empty
ret.embeddings.ubinary # This will be empty
ret.embeddings.binary # This will be empty

Finally, you can also pass several embedding types in as a list, in which case the endpoint will return a dictionary with both types available:

ret = co.embed(texts=phrases,
               model=model,
               input_type=input_type,
               embedding_types=['int8', 'float'])

ret.embeddings.int8 # This contains your int8 embeddings
ret.embeddings.float # This contains your float embeddings
ret.embeddings.uint8 # This will be empty
ret.embeddings.ubinary # This will be empty
ret.embeddings.binary # This will be empty