For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
    • Cookbooks
    • Agent API Calls
    • Short-Term Memory Handling for Agents
    • Agentic Multi-Stage RAG with Cohere Tools API
    • Agentic RAG for PDFs with mixed data
    • Analysis of Form 10-K/10-Q Using Cohere and RAG
    • Analyzing Hacker News with Six Language Understanding Methods
    • Article Recommender with Text Embedding Classification Extraction
    • Multi-Step Tool Use
    • Basic RAG
    • Basic Semantic Search
    • Basic Tool Use
    • Calendar Agent with Native Multi Step Tool
    • Chunking Strategies
    • Creating a QA Bot From Technical Documentation
    • Financial CSV Agent with Native Multi-Step Cohere API
    • Financial CSV Agent with Langchain
    • Migrating away from create_csv_agent in langchain-cohere
    • A Data Analyst Agent Built with Cohere and Langchain
    • Advanced Document Parsing For Enterprises
    • End-to-end RAG using Elasticsearch and Cohere
    • Semantic Search with Cohere Embed Jobs and Pinecone serverless Solution
    • Semantic Search with Cohere Embed Jobs
    • Fueling Generative Content with Keyword Research
    • Grounded Summarization Using Command R
    • Hello World! Meet Language AI
    • Long Form General Strategies
    • Migrating Monolithic Prompts to Command-R with RAG
    • Multilingual Search with Cohere and Langchain
    • PDF Extractor with Native Multi Step Tool Use
    • Pondr, Fostering Connection through Good Conversation
    • Deep Dive Into RAG Evaluation
    • RAG With Chat Embed and Rerank via Pinecone
    • Demo of Rerank
    • SQL Agent
    • Summarization Evals
    • Text Classification Using Embeddings
    • Topic Modeling AI Papers
    • Wikipedia Semantic Search with Cohere + Weaviate
    • Wikipedia Semantic Search with Cohere Embedding Archives
    • Build Chatbots That Know Your Business with MongoDB and Cohere
    • Finetuning on Cohere's Platform
    • Deploy your finetuned model on AWS Marketplace
    • Finetuning on AWS Sagemaker
    • SQL Agent with Cohere and LangChain (i-5O Case Study)
    • Introduction to Aya Vision
    • Retrieval Evaluation with LLM-as-a-Judge via Pydantic AI
    • Document Translation with Command A Translate
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
On this page
  • Step 1: Upload a dataset
  • Step 2: Create embeddings via Cohere’s Embed Jobs endpoint
  • Step 3: Download and prepare the embeddings
  • Step 4: Initialize Hnwslib index and add embeddings
  • Step 5: Query the index and rerank the results
  • Step 6: Display the results

Semantic Search with Cohere Embed Jobs

Was this page helpful?
Edit this page
Previous

Fueling Generative Content with Keyword Research

Next
Built with
Back to Cookbooks
Open in GitHub
PYTHON
1import time
2import cohere
3import hnswlib
4co = cohere.Client('COHERE_API_KEY')

Step 1: Upload a dataset

PYTHON
1dataset_file_path = "data/embed_jobs_sample_data.jsonl" # Full path - https://raw.githubusercontent.com/cohere-ai/cohere-developer-experience/main/notebooks/data/embed_jobs_sample_data.jsonl
2
3ds=co.create_dataset(
4 name='sample_file',
5 data=open(dataset_file_path, 'rb'),
6 keep_fields = ['id','wiki_id'],
7 dataset_type="embed-input"
8 )
Output
uploading file, starting validation...
sample-file-hca4x0 was uploaded
...
PYTHON
1print(ds.await_validation())
Output
cohere.Dataset {
id: sample-file-hca4x0
name: sample_file
dataset_type: embed-input
validation_status: validated
created_at: 2024-01-13 02:51:48.215973
updated_at: 2024-01-13 02:51:48.215973
download_urls: ['']
validation_error: None
validation_warnings: []
}

Step 2: Create embeddings via Cohere’s Embed Jobs endpoint

PYTHON
1job = co.create_embed_job(
2 dataset_id=ds.id,
3 input_type='search_document' ,
4 model='embed-english-v3.0',
5 embeddings_types=['float'])
6
7job.wait() # poll the server until the job is completed
Output
...
...
PYTHON
1print(job)
Output
cohere.EmbedJob {
job_id: 792bbc1a-561b-48c2-8a97-0c80c1914ea8
status: complete
created_at: 2024-01-13T02:53:31.879719Z
input_dataset_id: sample-file-hca4x0
output_urls: None
model: embed-english-v3.0
truncate: RIGHT
percent_complete: 100
output: cohere.Dataset {
id: embeded-sample-file-drtjf9
name: embeded-sample-file
dataset_type: embed-result
validation_status: validated
created_at: 2024-01-13 02:53:33.569362
updated_at: 2024-01-13 02:53:33.569362
download_urls: ['']
validation_error: None
validation_warnings: []
}
}

Step 3: Download and prepare the embeddings

PYTHON
1embeddings_file_path = 'embed_jobs_output.csv'
2output_dataset=co.get_dataset(job.output.id)
3output_dataset.save(filepath=embeddings_file_path, format="csv")
PYTHON
1embeddings=[]
2texts=[]
3for record in output_dataset:
4 embeddings.append(record['embeddings']['float'])
5 texts.append(record['text'])

Step 4: Initialize Hnwslib index and add embeddings

PYTHON
1index = hnswlib.Index(space='ip', dim=1024)
2index.init_index(max_elements=len(embeddings), ef_construction=512, M=64)
3index.add_items(embeddings,list(range(len(embeddings))))

Step 5: Query the index and rerank the results

PYTHON
1query = "What was the first youtube video about?"
2
3query_emb=co.embed(
4 texts=[query], model="embed-english-v3.0", input_type="search_query"
5 ).embeddings
6
7doc_index = index.knn_query(query_emb, k=10)[0][0]
8
9docs_to_rerank = []
10for index in doc_index:
11 docs_to_rerank.append(texts[index])
12
13final_result = co.rerank(
14 query= query,
15 documents=docs_to_rerank,
16 model="rerank-english-v2.0",
17 top_n=3)

Step 6: Display the results

PYTHON
1for idx, r in enumerate(final_result):
2 print(f"Document Rank: {idx + 1}, Document Index: {r.index}")
3 print(f"Document: {r.document['text']}")
4 print(f"Relevance Score: {r.relevance_score:.5f}")
5 print("\n")
Output
Document Rank: 1, Document Index: 0
Document: YouTube began as a venture capital–funded technology startup. Between November 2005 and April 2006, the company raised money from various investors, with Sequoia Capital, $11.5 million, and Artis Capital Management, $8 million, being the largest two. YouTube's early headquarters were situated above a pizzeria and a Japanese restaurant in San Mateo, California. In February 2005, the company activated codice_1. The first video was uploaded April 23, 2005. Titled "Me at the zoo", it shows co-founder Jawed Karim at the San Diego Zoo and can still be viewed on the site. In May, the company launched a public beta and by November, a Nike ad featuring Ronaldinho became the first video to reach one million total views. The site launched officially on December 15, 2005, by which time the site was receiving 8 million views a day. Clips at the time were limited to 100 megabytes, as little as 30 seconds of footage.
Relevance Score: 0.94815
Document Rank: 2, Document Index: 1
Document: Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show. Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site. Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not. They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward. Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.
Relevance Score: 0.91626
Document Rank: 3, Document Index: 2
Document: YouTube was not the first video-sharing site on the Internet; Vimeo was launched in November 2004, though that site remained a side project of its developers from CollegeHumor at the time and did not grow much, either. The week of YouTube's launch, NBC-Universal's "Saturday Night Live" ran a skit "Lazy Sunday" by The Lonely Island. Besides helping to bolster ratings and long-term viewership for "Saturday Night Live", "Lazy Sunday"'s status as an early viral video helped establish YouTube as an important website. Unofficial uploads of the skit to YouTube drew in more than five million collective views by February 2006 before they were removed when NBCUniversal requested it two months later based on copyright concerns. Despite eventually being taken down, these duplicate uploads of the skit helped popularize YouTube's reach and led to the upload of more third-party content. The site grew rapidly; in July 2006, the company announced that more than 65,000 new videos were being uploaded every day and that the site was receiving 100 million video views per day.
Relevance Score: 0.90665