Embed v3.0 Models are now Multimodal

Today we’re announcing updates to our embed-v3.0 family of models. These models now have the ability to process images into embeddings. There is no change to existing text capabilities which means there is no need to re-embed texts you have already processed with our embed-v3.0 models.

In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing.

Technical Details

API Changes:

The Embed API has two major changes:

Introduced a new input_type called image
Introduced a new parameter called images

Example request on how to process

cURL

POST https://api.cohere.ai/v1/embed
{
    "model": "embed-multilingual-v3.0",
    "input_type": "image",
    "embedding_types": ["float"],
    "images": [enc_img]
}

Restrictions:

The API only accepts images in the base format of the following: png, jpeg,Webp, and gif
Image embeddings currently does not support batching so the max images sent per request is 1
The maximum image sizez is 5mb
The images parameter only accepts a base64 encoded image formatted as a Data Url