Redis and Cohere

RedisVL provides a powerful, dedicated Python client library for using Redis as a Vector Database. This walks through how to integrate Cohere embeddings with Redis using a dataset of Wikipedia articles to set up a pipeline for semantic search. It will cover:

  • Setting up a Redis index
  • Embedding passages and storing them in the database
  • Embedding the user’s search query and searching against your Redis index
  • Exploring different filtering options for your query

To see the full code sample, refer to this notebook. You can also consult this guide for more information on using Cohere with Redis.

Prerequisites:

The code samples on this page assume the following:

  • You have docker running locally
SHELL
$docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
  • You have Redis installed (follow this link if you don’t).
  • You have a Cohere API Key (you can get your API Key at this link).

Install Packages:

Install and import the required Python Packages:

  • jsonlines: for this example, the sample passages live in a jsonl file, and we will use jsonlines to load this data into our environment.
  • redisvl: ensure you are on version 0.1.0 or later
  • cohere: ensure you are on version 4.45 or later

To install the packages, use the following code

SHELL
$!pip install redisvl==0.1.0
>!pip install cohere==4.45
>!pip install jsonlines

Import the required packages:

PYTHON
1from redis import Redis
2from redisvl.index import SearchIndex
3from redisvl.schema import IndexSchema
4from redisvl.utils.vectorize import CohereTextVectorizer
5from redisvl.query import VectorQuery
6from redisvl.query.filter import Tag, Text, Num
7import jsonlines

Building a Retrieval Pipeline with Cohere and Redis

Setting up the Schema.yaml:

To configure a Redis index you can either specify a yaml file or import a dictionary. In this tutorial we will be using a yaml file with the following schema. Either use the yaml file found at this link, or create a .yaml file locally with the following configuration.

YAML
1version: "0.1.0"
2index:
3 name: semantic_search_demo
4 prefix: rvl
5 storage_type: hash
6
7fields:
8 - name: url
9 type: text
10 - name: title
11 type: tag
12 - name: text
13 type: text
14 - name: wiki_id
15 type: numeric
16 - name: paragraph_id
17 type: numeric
18 - name: id
19 type: numeric
20 - name: views
21 type: numeric
22 - name: langs
23 type: numeric
24 - name: embedding
25 type: vector
26 attrs:
27 algorithm: flat
28 dims: 1024
29 distance_metric: cosine
30 datatype: float32

This index has a name of semantic_search_demo and uses storage_type: hash which means we must set as_buffer=True whenever we call the vectorizer. Hash data structures are serialized as a string and thus we store the embeddings in hashes as a byte string.

For this guide, we will be using the Cohere embed-english-v3.0 model which has a vector dimension size of 1024.

Initializing the Cohere Text Vectorizer:

PYTHON
1# create a vectorizer
2api_key='{Insert your cohere API Key}'
3
4cohere_vectorizer = CohereTextVectorizer(
5 model="embed-english-v3.0",
6 api_config={"api_key": api_key},
7)

Create a CohereTextVectorizer by specifying the embedding model and your api key.

The following link contains details around the available embedding models from Cohere and their respective dimensions.

Initializing the Redis Index:

PYTHON
1# construct a search index from the schema - this schema is called "semantic_search_demo"
2schema = IndexSchema.from_yaml("./schema.yaml")
3client = Redis.from_url("redis://localhost:6379")
4index = SearchIndex(schema, client)
5
6# create the index (no data yet)
7index.create(overwrite=True)

Note that we are using SearchIndex.from_yaml because we are choosing to import the schema from a yaml file, we could also do SearchIndex.from_dict as well.

CURL
1!rvl index listall

The above code checks to see if an index has been created. If it has, you should see something like this below:

TEXT
15:39:22 [RedisVL] INFO Indices:
15:39:22 [RedisVL] INFO 1. semantic_search_demo

Look inside the index to make sure it matches the schema you want

CURL
1!rvl index info -i semantic_search_demo

You should see something like this:

Look inside the index to make sure it matches the schema you want:
╭──────────────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │
├──────────────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ semantic_search_demo │ HASH │ ['rvl'] │ [] │ 0 │
╰──────────────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭──────────────┬──────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name │ Attribute │ Type │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │ Field Option │ Option Value │
├──────────────┼──────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤
│ url │ url │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │
│ title │ title │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │
│ text │ text │ TEXT │ WEIGHT │ 1 │ │ │ │ │ │ │
│ wiki_id │ wiki_id │ NUMERIC │ │ │ │ │ │ │ │ │
│ paragraph_id │ paragraph_id │ NUMERIC │ │ │ │ │ │ │ │ │
│ id │ id │ NUMERIC │ │ │ │ │ │ │ │ │
│ views │ views │ NUMERIC │ │ │ │ │ │ │ │ │
│ langs │ langs │ NUMERIC │ │ │ │ │ │ │ │ │
│ embedding │ embedding │ VECTOR │ algorithm │ FLAT │ data_type │ FLOAT32 │ dim │ 1024 │ distance_metric │ COSINE │
╰──────────────┴──────────────┴─────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯

You can also visit: http://localhost:8001/redis-stack/browser. The Redis GUI will show you the index in realtime.

GUI
Redis GUI

Loading your Documents and Embedding them into Redis:

PYTHON
1# read in your documents
2jsonl_file_path='data/redis_guide_data.jsonl'
3
4corpus=[]
5text_to_embed=[]
6
7with jsonlines.open(jsonl_file_path, mode='r') as reader:
8 for line in reader:
9 corpus.append(line)
10 # we want to store the embeddings of the field called `text`
11 text_to_embed.append(line['text'])
12
13# call embed_many which returns an array
14# hash data structures get serialized as a string and thus we store the embeddings in hashes as a byte string (handled by numpy)
15res=cohere_vectorizer.embed_many(text_to_embed, input_type='search_document', as_buffer=True)

We will be loading a subset of data which contains paragraphs from wikipedia - the data lives in a jsonl and we will need to parse it to get the text field which is what we are embedding. To do this, we load the file and read it line-by-line, creating a corpus object and a text_to_embed object. We then pass the text_to_embed object into co.embed_many which takes in an list of strings.

Prepare your Data to be inserted into the Index:

PYTHON
1# contruct the data payload to be uploaded to your index
2data = [{"url": row['url'],
3 "title": row['title'],
4 "text": row['text'],
5 "wiki_id": row['wiki_id'],
6 "paragraph_id": row['paragraph_id'],
7 "id":row['id'],
8 "views":row['views'],
9 "langs":row['langs'],
10 "embedding":v}
11 for row, v in zip(corpus, res)]
12
13# load the data into your index
14index.load(data)

We want to preserve all the meta-data for each paragraph into our table and create a list of dictionaries which is inserted into the index

At this point, your Redis DB is ready for semantic search!

Query your Redis DB:

PYTHON
1# use the Cohere vectorizer again to create a query embedding
2query_embedding = cohere_vectorizer.embed("What did Microsoft release in 2015?", input_type='search_query',as_buffer=True)
3
4
5query = VectorQuery(
6 vector=query_embedding,
7 vector_field_name="embedding",
8 return_fields=["url","wiki_id","paragraph_id","id","views","langs","title","text",],
9 num_results=5
10)
11
12results = index.query(query)
13
14for doc in results:
15 print(f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\n\n")

Use the VectorQuery class to construct a query object - here you can specify the fields you’d like Redis to return as well as the number of results (i.e. for this example we set it to 5).

Redis Filters

Adding Tag Filters:

PYTHON
1# Initialize a tag filter
2tag_filter = Tag("title") == "Microsoft Office"
3
4# set the tag filter on our existing query
5query.set_filter(tag_filter)
6
7results = index.query(query)
8
9for doc in results:
10 print(f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\n")

One feature of Redis is the ability to add filtering to your queries on the fly. Here we are constructing a tag filter on the column title which was initialized in our schema with type=tag.

Using Filter Expressions:

PYTHON
1# define a tag match on the title, text match on the text field, and numeric filter on the views field
2filter_data=(Tag('title')=='Elizabeth II') & (Text("text")% "born") & (Num("views")>4500)
3
4query_embedding = co.embed("When was she born?", input_type='search_query',as_buffer=True)
5
6# reinitialize the query with the filter expression
7query = VectorQuery(
8 vector=query_embedding,
9 vector_field_name="embedding",
10 return_fields=["url","wiki_id","paragraph_id","id","views","langs","title","text",],
11 num_results=5,
12 filter_expression=filter_data
13)
14
15results = index.query(query)
16print(results)
17
18for doc in results:
19 print(f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\nView {doc['views']}")

Another feature of Redis is the ability to initialize a query with a set of filters called a filter expression. A filter expression allows for the you to combine a set of filters over an arbitrary set of fields at query time.