Elasticsearch and Cohere

Elasticsearch has all the tools developers need to build next generation search experiences with generative AI, and it supports native integration with Cohere through their inference API.

Use Elastic if you’d like to build with:

  • A vector database
  • Deploy multiple ML models
  • Perform text, vector and hybrid search
  • Search with filters, facet, aggregations
  • Apply document and field level security
  • Run on-prem, cloud, or serverless (preview)

This guide uses a dataset of Wikipedia articles to set up a pipeline for semantic search. It will cover:

  • Creating an Elastic inference processor using Cohere embeddings
  • Creating an Elasticsearch index with embeddings
  • Performing hybrid search on the Elasticsearch index and reranking results
  • Performing basic RAG

To see the full code sample, refer to this notebook.

Learn More

Prerequisites

This tutorial assumes you have the following:

  • An Elastic Cloud account through Elastic Cloud, available with a free trial
  • A Cohere production API Key. Get your API Key at this link if you don’t have one
  • Python 3.7 or higher

Note: While this tutorial integrates Cohere with an Elastic Cloud serverless project, you can also integrate with your self-managed Elasticsearch deployment or Elastic Cloud deployment by simply switching from the serverless to the general language client.

Create an Elastic Serverless deployment

If you don’t have an Elastic Cloud deployment, sign up here for a free trial and request access to Elastic Serverless

Install the required packages

Install and import the required Python Packages:

  • elasticsearch_serverless
  • cohere: ensure you are on version 5.2.5 or later

To install the packages, use the following code

PYTHON
1!pip install elasticsearch_serverless==0.2.0.20231031
2!pip install cohere==5.2.5

After the instalation has finished, find your endpoint URL and create your API key in the Serverless dashboard.

Import the required packages

Next, we need to import the modules we need. 🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

PYTHON
1from elasticsearch_serverless import Elasticsearch, helpers
2from getpass import getpass
3import cohere
4import json
5import requests

Create an Elasticsearch client

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their endpoint and encoded API key. Then we create a client object that instantiates an instance of the Elasticsearch class.

When creating your Elastic Serverless API key make sure to turn on Control security privileges, and edit cluster privileges to specify "cluster": ["all"].

PYTHON
1ELASTICSEARCH_ENDPOINT = getpass("Elastic Endpoint: ")
2ELASTIC_API_KEY = getpass("Elastic encoded API key: ") # Use the encoded API key
3
4client = Elasticsearch(
5 ELASTICSEARCH_ENDPOINT,
6 api_key=ELASTIC_API_KEY
7)
8
9# Confirm the client has connected
10print(client.info())

Build a Hybrid Search Index with Cohere and Elasticsearch

Create an inference endpoint

One of the biggest pain points of building a vector search index is computing embeddings for a large corpus of data. Fortunately Elastic offers inference endpoints that can be used in ingest pipelines to automatically compute embeddings when bulk indexing operations are performed.

To set up an inference pipeline for ingestion we first must create an inference endpoint that uses Cohere embeddings. You’ll need a Cohere API key for this that you can find in your Cohere account under the API keys section.

We will create an inference endpoint that uses embed-english-v3.0 and int8 or byte compression to save on storage.

PYTHON
1COHERE_API_KEY = getpass("Enter Cohere API key: ")
2# Delete the inference model if it already exists
3client.options(ignore_status=[404]).inference.delete(inference_id="cohere_embeddings")
4
5client.inference.put(
6 task_type="text_embedding",
7 inference_id="cohere_embeddings",
8 body={
9 "service": "cohere",
10 "service_settings": {
11 "api_key": COHERE_API_KEY,
12 "model_id": "embed-english-v3.0",
13 "embedding_type": "int8",
14 "similarity": "cosine"
15 },
16 "task_settings": {},
17 },
18)

Here’s what you might see:

Enter Cohere API key: ··········
ObjectApiResponse({'model_id': 'cohere_embeddings', 'inference_id': 'cohere_embeddings', 'task_type': 'text_embedding', 'service': 'cohere', 'service_settings': {'similarity': 'cosine', 'dimensions': 1024, 'model_id': 'embed-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}, 'embedding_type': 'byte'}, 'task_settings': {}})

Create the Index

The mapping of the destination index – the index that contains the embeddings that the model will generate based on your input text – must be created. The destination index must have a field with the semantic_text field type to index the output of the Cohere model.

Let’s create an index named cohere-wiki-embeddings with the mappings we need

PYTHON
1client.indices.delete(index="cohere-wiki-embeddings", ignore_unavailable=True)
2client.indices.create(
3 index="cohere-wiki-embeddings",
4 mappings={
5 "properties": {
6 "text_semantic": {
7 "type": "semantic_text",
8 "inference_id": "cohere_embeddings"
9 },
10 "text": {"type": "text", "copy_to": "text_semantic"},
11 "wiki_id": {"type": "integer"},
12 "url": {"type": "text"},
13 "views": {"type": "float"},
14 "langs": {"type": "integer"},
15 "title": {"type": "text"},
16 "paragraph_id": {"type": "integer"},
17 "id": {"type": "integer"}
18 }
19 },
20)

You might see something like this:

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'cohere-wiki-embeddings'})

Let’s note a few important parameters from that API call:

  • semantic_text: A field type automatically generates embeddings for text content using an inference endpoint.
  • inference_id: Specifies the ID of the inference endpoint to be used. In this example, the model ID is set to cohere_embeddings.
  • copy_to: Specifies the output field which contains inference results

Insert Documents

Let’s insert our example wiki dataset. You need a production Cohere account to complete this step, otherwise the documentation ingest will time out due to the API request rate limits.

PYTHON
1url = "https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl"
2response = requests.get(url)
3
4# Load the response data into a JSON object
5jsonl_data = response.content.decode('utf-8').splitlines()
6
7# Prepare the documents to be indexed
8documents = []
9for line in jsonl_data:
10 data_dict = json.loads(line)
11 documents.append({
12 "_index": "cohere-wiki-embeddings",
13 "_source": data_dict,
14 }
15 )
16
17# Use the bulk endpoint to index
18helpers.bulk(client, documents)
19
20print("Done indexing documents into `cohere-wiki-embeddings` index!")

You should see this:

Done indexing documents into `cohere-wiki-embeddings` index!

After the dataset has been enriched with the embeddings, you can query the data using the semantic query provided by Elasticsearch. semantic_text in Elasticsearch simplifies the semantic search significantly. Learn more about how semantic text in Elasticsearch allows you to focus on your model and results instead of on the technical details.

PYTHON
1query = "When were the semi-finals of the 2022 FIFA world cup played?"
2
3response = client.search(
4 index="cohere-wiki-embeddings",
5 size=100,
6 query = {
7 "semantic": {
8 "query": "When were the semi-finals of the 2022 FIFA world cup played?",
9 "field": "text_semantic"
10 }
11 }
12)
13
14raw_documents = response["hits"]["hits"]
15
16# Display the first 10 results
17for document in raw_documents[0:10]:
18 print(f'Title: {document["_source"]["title"]}\nText: {document["_source"]["text"]}\n')
19
20# Format the documents for ranking
21documents = []
22for hit in response["hits"]["hits"]:
23 documents.append(hit["_source"]["text"])

Here’s what that might look like:

Title: 2022 FIFA World Cup
Text: The 2022 FIFA World Cup was an international football tournament contested by the men's national teams of FIFA's member associations and 22nd edition of the FIFA World Cup. It took place in Qatar from 20 November to 18 December 2022, making it the first World Cup held in the Arab world and Muslim world, and the second held entirely in Asia after the 2002 tournament in South Korea and Japan. France were the defending champions, having defeated Croatia 4–2 in the 2018 final. At an estimated cost of over $220 billion, it is the most expensive World Cup ever held to date; this figure is disputed by Qatari officials, including organising CEO Nasser Al Khater, who said the true cost was $8 billion, and other figures related to overall infrastructure development since the World Cup was awarded to Qatar in 2010.
Title: 2022 FIFA World Cup
Text: The semi-finals were played on 13 and 14 December. Messi scored a penalty kick before Julián Álvarez scored twice to give Argentina a 3–0 victory over Croatia. Théo Hernandez scored after five minutes as France led Morocco for most of the game and later Randal Kolo Muani scored on 78 minutes to complete a 2–0 victory for France over Morocco as they reached a second consecutive final.
Title: 2022 FIFA World Cup
Text: The quarter-finals were played on 9 and 10 December. Croatia and Brazil ended 0–0 after 90 minutes and went to extra time. Neymar scored for Brazil in the 15th minute of extra time. Croatia, however, equalised through Bruno Petković in the second period of extra time. With the match tied, a penalty shootout decided the contest, with Croatia winning the shoot-out 4–2. In the second quarter-final match, Nahuel Molina and Messi scored for Argentina before Wout Weghorst equalised with two goals shortly before the end of the game. The match went to extra time and then penalties, where Argentina would go on to win 4–3. Morocco defeated Portugal 1–0, with Youssef En-Nesyri scoring at the end of the first half. Morocco became the first African and the first Arab nation to advance as far as the semi-finals of the competition. Despite Harry Kane scoring a penalty for England, it was not enough to beat France, who won 2–1 by virtue of goals from Aurélien Tchouaméni and Olivier Giroud, sending them to their second consecutive World Cup semi-final and becoming the first defending champions to reach this stage since Brazil in 1998.
Title: 2022 FIFA World Cup
Text: Unlike previous FIFA World Cups, which are typically played in June and July, because of Qatar's intense summer heat and often fairly high humidity, the 2022 World Cup was played in November and December. As a result, the World Cup was unusually staged in the middle of the seasons of domestic association football leagues, which started in late July or August, including all of the major European leagues, which had been obliged to incorporate extended breaks into their domestic schedules to accommodate the World Cup. Major European competitions had scheduled their respective competitions group matches to be played before the World Cup, to avoid playing group matches the following year.
Title: 2022 FIFA World Cup
Text: The match schedule was confirmed by FIFA in July 2020. The group stage was set to begin on 21 November, with four matches every day. Later, the schedule was tweaked by moving the Qatar vs Ecuador game to 20 November, after Qatar lobbied FIFA to allow their team to open the tournament. The final was played on 18 December 2022, National Day, at Lusail Stadium.
Title: 2022 FIFA World Cup
Text: Owing to the climate in Qatar, concerns were expressed over holding the World Cup in its traditional time frame of June and July. In October 2013, a task force was commissioned to consider alternative dates and report after the 2014 FIFA World Cup in Brazil. On 24 February 2015, the FIFA Task Force proposed that the tournament be played from late November to late December 2022, to avoid the summer heat between May and September and also avoid clashing with the 2022 Winter Olympics in February, the 2022 Winter Paralympics in March and Ramadan in April.
Title: 2022 FIFA World Cup
Text: Of the 32 nations qualified to play at the 2022 FIFA World Cup, 24 countries competed at the previous tournament in 2018. Qatar were the only team making their debut in the FIFA World Cup, becoming the first hosts to make their tournament debut since Italy in 1934. As a result, the 2022 tournament was the first World Cup in which none of the teams that earned a spot through qualification were making their debut. The Netherlands, Ecuador, Ghana, Cameroon, and the United States returned to the tournament after missing the 2018 tournament. Canada returned after 36 years, their only prior appearance being in 1986. Wales made their first appearance in 64 years – the longest ever gap for any team, their only previous participation having been in 1958.
Title: 2022 FIFA World Cup
Text: After UEFA were guaranteed to host the 2018 event, members of UEFA were no longer in contention to host in 2022. There were five bids remaining for the 2022 FIFA World Cup: Australia, Japan, Qatar, South Korea, and the United States.
Title: Cristiano Ronaldo
Text: Ronaldo was named in Portugal's squad for the 2022 FIFA World Cup in Qatar, making it his fifth World Cup. On 24 November, in Portugal's opening match against Ghana, Ronaldo scored a penalty kick and became the first male player to score in five different World Cups. In the last group game against South Korea, Ronaldo received criticism from his own coach for his reaction at being substituted. He was dropped from the starting line-up for Portugal's last 16 match against Switzerland, marking the first time since Euro 2008 that he had not started a game for Portugal in a major international tournament, and the first time Portugal had started a knockout game without Ronaldo in the starting line-up at an international tournament since Euro 2000. He came off the bench late on as Portugal won 6–1, their highest tally in a World Cup knockout game since the 1966 World Cup, with Ronaldo's replacement Gonçalo Ramos scoring a hat-trick. Portugal employed the same strategy in the quarter-finals against Morocco, with Ronaldo once again coming off the bench; in the process, he equalled Bader Al-Mutawa's international appearance record, becoming the joint–most capped male footballer of all time, with 196 caps. Portugal lost 1–0, however, with Morocco becoming the first CAF nation ever to reach the World Cup semi-finals.
Title: 2022 FIFA World Cup
Text: The final draw was held at the Doha Exhibition and Convention Center in Doha, Qatar, on 1 April 2022, 19:00 AST, prior to the completion of qualification. The two winners of the inter-confederation play-offs and the winner of the Path A of the UEFA play-offs were not known at the time of the draw. The draw was attended by 2,000 guests and was led by Carli Lloyd, Jermaine Jenas and sports broadcaster Samantha Johnson, assisted by the likes of Cafu (Brazil), Lothar Matthäus (Germany), Adel Ahmed Malalla (Qatar), Ali Daei (Iran), Bora Milutinović (Serbia/Mexico), Jay-Jay Okocha (Nigeria), Rabah Madjer (Algeria), and Tim Cahill (Australia).

After the dataset has been enriched with the embeddings, you can query the data using hybrid search.

Pass a semantic query, and provide the query text and the model you have used to create the embeddings.

PYTHON
1query = "When were the semi-finals of the 2022 FIFA world cup played?"
2
3response = client.search(
4 index="cohere-wiki-embeddings",
5 size=100,
6 query={
7 "bool": {
8 "must": {
9 "multi_match": {
10 "query": "When were the semi-finals of the 2022 FIFA world cup played?",
11 "fields": ["text", "title"]
12 }
13 },
14 "should": {
15 "semantic": {
16 "query": "When were the semi-finals of the 2022 FIFA world cup played?",
17 "field": "text_semantic"
18 }
19 },
20 }
21 }
22
23)
24
25raw_documents = response["hits"]["hits"]
26
27# Display the first 10 results
28for document in raw_documents[0:10]:
29 print(f'Title: {document["_source"]["title"]}\nText: {document["_source"]["text"]}\n')
30
31# Format the documents for ranking
32documents = []
33for hit in response["hits"]["hits"]:
34 documents.append(hit["_source"]["text"])

Ranking

In order to effectively combine the results from our vector and BM25 retrieval, we can use Cohere’s Rerank 3 model through the inference API to provide a final, more precise, semantic reranking of our results.

First, create an inference endpoint with your Cohere API key. Make sure to specify a name for your endpoint, and the model_id of one of the rerank models. In this example we will use Rerank 3.

PYTHON
1# Delete the inference model if it already exists
2client.options(ignore_status=[404]).inference.delete(inference_id="cohere_rerank")
3
4client.inference.put(
5 task_type="rerank",
6 inference_id="cohere_rerank",
7 body={
8 "service": "cohere",
9 "service_settings":{
10 "api_key": COHERE_API_KEY,
11 "model_id": "rerank-english-v3.0"
12 },
13 "task_settings": {
14 "top_n": 10,
15 },
16 }
17)

You can now rerank your results using that inference endpoint. Here we will pass in the query we used for retrieval, along with the documents we just retrieved using hybrid search.

The inference service will respond with a list of documents in descending order of relevance. Each document has a corresponding index (reflecting to the order the documents were in when sent to the inference endpoint), and if the “return_documents” task setting is True, then the document texts will be included as well.

In this case we will set the response to False and will reconstruct the input documents based on the index returned in the response.

PYTHON
1response = client.inference.inference(
2 inference_id="cohere_rerank",
3 body={
4 "query": query,
5 "input": documents,
6 "task_settings": {
7 "return_documents": False
8 }
9 }
10)
11
12# Reconstruct the input documents based on the index provided in the rereank response
13ranked_documents = []
14for document in response.body["rerank"]:
15 ranked_documents.append({
16 "title": raw_documents[int(document["index"])]["_source"]["title"],
17 "text": raw_documents[int(document["index"])]["_source"]["text"]
18 })
19
20# Print the top 10 results
21for document in ranked_documents[0:10]:
22 print(f"Title: {document['title']}\nText: {document['text']}\n")

Retrieval augemented generation

Now that we have ranked our results, we can easily turn this into a RAG system with Cohere’s Chat API. Pass in the retrieved documents, along with the query and see the grounded response using Cohere’s newest generative model Command R+.

First, we will create the Cohere client.

PYTHON
1co = cohere.Client(COHERE_API_KEY)

Next, we can easily get a grounded generation with citations from the Cohere Chat API. We simply pass in the user query and documents retrieved from Elastic to the API, and print out our grounded response.

PYTHON
1response = co.chat(
2 message=query,
3 documents=ranked_documents,
4 model='command-r-plus-08-2024'
5)
6
7source_documents = []
8for citation in response.citations:
9 for document_id in citation.document_ids:
10 if document_id not in source_documents:
11 source_documents.append(document_id)
12
13print(f"Query: {query}")
14print(f"Response: {response.text}")
15print("Sources:")
16for document in response.documents:
17 if document['id'] in source_documents:
18 print(f"{document['title']}: {document['text']}")

And there you have it! A quick and easy implementation of hybrid search and RAG with Cohere and Elastic.