Embed v3.0 Models are now Multimodal

Today we’re announcing updates to our embed-v3.0 family of models. These models now have the ability to process images into embeddings. There is no change to existing text capabilities which means there is no need to re-embed texts you have already processed with our embed-v3.0 models.

In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing.

Technical Details

API Changes:

The Embed API has two major changes:

  • Introduced a new input_type called image
  • Introduced a new parameter called images

Example request on how to process

cURL
1POST https://api.cohere.ai/v1/embed
2{
3 "model": "embed-multilingual-v3.0",
4 "input_type": "image",
5 "embedding_types": ["float"],
6 "images": [enc_img]
7}

Restrictions:

  • The API only accepts images in the base format of the following: png, jpeg,Webp, and gif
  • Image embeddings currently does not support batching so the max images sent per request is 1
  • The maximum image sizez is 5mb
  • The images parameter only accepts a base64 encoded image formatted as a Data Url