For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Get Started
    • Introduction
    • Installation
    • Creating a client
    • Playground
    • FAQs
  • Models
    • An Overview of Cohere's Models
    • Embed
    • Rerank
    • Aya
  • Text Generation
    • Introduction to Text Generation at Cohere
    • Using the Chat API
    • Streaming Responses
    • Structured Outputs
    • Predictable Outputs
    • Advanced Generation Parameters
    • Retrieval Augmented Generation (RAG)
    • Tool Use
    • Tokens and Tokenizers
    • Migrating from the Generate API to the Chat API
    • Summarizing Text
    • Safety Modes
  • Embeddings (Vectors, Search, Retrieval)
    • Introduction to Embeddings at Cohere
    • Semantic Search with Embeddings
    • Multimodal Embeddings
    • Batch Embedding Jobs
  • Going to Production
    • API Keys and Rate Limits
    • Going Live
    • Deprecations
    • How Does Cohere's Pricing Work?
  • Integrations
    • Integrating Embedding Models with Other Tools
    • Cohere and LangChain
    • LlamaIndex and Cohere
  • Deployment Options
    • Overview
    • SDK Compatibility
  • Tutorials
    • Cookbooks
    • LLM University
    • Build Things with Cohere!
  • Responsible Use
    • Security
    • Usage Policy
    • Command A Technical Report
    • Command R and Command R+ Model Card
  • Cohere Labs
    • Cohere Labs Acceptable Use Policy
  • More Resources
    • Cohere Toolkit
    • Datasets
    • Improve Cohere Docs
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Introduction to Multimodal Embeddings
  • How to use Multimodal Embeddings
  • 1. Prepare your Image for Embeddings
  • 2. Call the Embed Endpoint
  • Sample Output
Embeddings (Vectors, Search, Retrieval)

Unlocking the Power of Multimodal Embeddings

Was this page helpful?
Edit this page
Previous

Batch Embedding Jobs with the Embed API

Next
Built with
embeddings.
This Guide Uses the Embed API.

You can find the API reference for the api here

Image capabilities are only compatible with v4.0 and v3.0 models, but v4.0 has features that v3.0 does not have. Consult the embedding documentation for more details.

In this guide, we show you how to use the embed endpoint to embed a series of images. This guide uses a simple dataset of graphs to illustrate how semantic search can be done over images with Cohere. To see an end-to-end example of retrieval, check out this notebook.

Introduction to Multimodal Embeddings

Information is often represented in multiple modalities. A document, for instance, may contain text, images, and graphs, while a product can be described through images, its title, and a written description. This combination of elements often leads to a comprehensive semantic understanding of the subject matter. Traditional embedding models have been limited to a single modality, and even multimodal embedding models often suffer from degradation in text-to-text or text-to-image retrieval tasks. embed-v4.0 and the embed-v3.0 series of models, however, are fully multimodal, enabling them to embed both images and text effectively. We have achieved state-of-the-art performance without compromising text-to-text retrieval capabilities.

How to use Multimodal Embeddings

1. Prepare your Image for Embeddings

The Embed API takes in images with the following file formats: png, jpeg,Webp, and gif. The images must then be formatted as a Data URL.

PYTHON
1# Import the necessary packages
2import os
3import base64
4
5
6# Defining the function to convert an image to a base 64 Data URL
7def image_to_base64_data_url(image_path):
8 _, file_extension = os.path.splitext(image_path)
9 file_type = file_extension[1:]
10
11 with open(image_path, "rb") as f:
12 enc_img = base64.b64encode(f.read()).decode("utf-8")
13 enc_img = f"data:image/{file_type};base64,{enc_img}"
14 return enc_img
15
16
17image_path = "<YOUR IMAGE PATH>"
18processed_image = image_to_base64_data_url(image_path)
19
20
21# format the input_object
22image_input = [
23 {"content": [{"type": "image", "image": processed_image}]}
24]
25
26res = co.embed(
27 model="embed-v4.0",
28 inputs=image_input,
29 input_type="search_document",
30 output_dimension=1024,
31 embedding_types=["float"],
32)
33res.embeddings.float

2. Call the Embed Endpoint

PYTHON
1# Import the necessary packages
2import cohere
3
4co = cohere.ClientV2(api_key="<YOUR API KEY>")
5
6# format the input_object
7image_input = [
8 {"content": [{"type": "image", "image": processed_image}]}
9]
10
11co.embed(
12 model="embed-v4.0",
13 inputs=image_input,
14 input_type="search_document",
15 embedding_types=["float"],
16)

Sample Output

Below is a sample of what the output would look like if you passed in a jpeg with original dimensions of 1080x1350 with a standard bit-depth of 24.

JSON
1{
2 'id': '0d9bb922-f15f-4b8b-9a2f-72577324528f',
3 'texts': [],
4 'images': [{'width': 1080, 'height': 1350, 'format': 'jpeg', 'bit_depth': 24}],
5 'embeddings': {'float': [[-0.035369873, 0.040740967, 0.008262634, -0.008766174, .....]]},
6 'meta': {
7 'api_version': {'version': '1'},
8 'billed_units': {'images': 1}
9 },
10 'response_type': 'embeddings_by_type'
11}