RedisVL provides a powerful, dedicated Python client library for using Redis as a Vector Database. This walks through how to integrate Cohere embeddings with Redis using a dataset of Wikipedia articles to set up a pipeline for semantic search. It will cover:

Setting up a Redis index
Embedding passages and storing them in the database
Embedding the user’s search query and searching against your Redis index
Exploring different filtering options for your query

To see the full code sample, refer to this notebook. You can also consult this guide for more information on using Cohere with Redis.

Prerequisites:

The code samples on this page assume the following:

You have docker running locally

SHELL

$ docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

You have Redis installed (follow this link if you don’t).
You have a Cohere API Key (you can get your API Key at this link).

Install Packages:

Install and import the required Python Packages:

jsonlines: for this example, the sample passages live in a jsonl file, and we will use jsonlines to load this data into our environment.
redisvl: ensure you are on version 0.1.0 or later
cohere: ensure you are on version 4.45 or later

To install the packages, use the following code

SHELL

$ !pip install redisvl==0.1.0
> !pip install cohere==4.45
> !pip install jsonlines

Import the required packages:

PYTHON

1 from redis import Redis
2 from redisvl.index import SearchIndex
3 from redisvl.schema import IndexSchema
4 from redisvl.utils.vectorize import CohereTextVectorizer
5 from redisvl.query import VectorQuery
6 from redisvl.query.filter import Tag, Text, Num
7 import jsonlines

Building a Retrieval Pipeline with Cohere and Redis

Setting up the Schema.yaml:

To configure a Redis index you can either specify a yaml file or import a dictionary. In this tutorial we will be using a yaml file with the following schema. Either use the yaml file found at this link, or create a .yaml file locally with the following configuration.

YAML

1 version: "0.1.0"
2 index:
3   name: semantic_search_demo
4   prefix: rvl
5   storage_type: hash
6 
7 fields:
8   - name: url
9     type: text
10   - name: title
11     type: tag
12   - name: text
13     type: text
14   - name: wiki_id
15     type: numeric
16   - name: paragraph_id
17     type: numeric
18   - name: id
19     type: numeric
20   - name: views
21     type: numeric
22   - name: langs
23     type: numeric
24   - name: embedding
25     type: vector
26     attrs:
27       algorithm: flat
28       dims: 1024
29       distance_metric: cosine
30       datatype: float32

This index has a name of semantic_search_demo and uses storage_type: hash which means we must set as_buffer=True whenever we call the vectorizer. Hash data structures are serialized as a string and thus we store the embeddings in hashes as a byte string.

For this guide, we will be using the Cohere embed-english-v3.0 model which has a vector dimension size of 1024.

Initializing the Cohere Text Vectorizer:

PYTHON

1 # create a vectorizer
2 api_key = "{Insert your cohere API Key}"
3 
4 cohere_vectorizer = CohereTextVectorizer(
5     model="embed-english-v3.0",
6     api_config={"api_key": api_key},
7 )

Create a CohereTextVectorizer by specifying the embedding model and your api key.

The following link contains details around the available embedding models from Cohere and their respective dimensions.

Initializing the Redis Index:

PYTHON

1 # construct a search index from the schema - this schema is called "semantic_search_demo"
2 schema = IndexSchema.from_yaml("./schema.yaml")
3 client = Redis.from_url("redis://localhost:6379")
4 index = SearchIndex(schema, client)
5 
6 # create the index (no data yet)
7 index.create(overwrite=True)

Note that we are using SearchIndex.from_yaml because we are choosing to import the schema from a yaml file, we could also do SearchIndex.from_dict as well.

CURL

1 !rvl index listall

The above code checks to see if an index has been created. If it has, you should see something like this below:

TEXT

15:39:22 [RedisVL] INFO   Indices:
15:39:22 [RedisVL] INFO   1. semantic_search_demo

Look inside the index to make sure it matches the schema you want

CURL

1 !rvl index info -i semantic_search_demo

You should see something like this:

Look inside the index to make sure it matches the schema you want:
╭──────────────────────┬────────────────┬────────────┬─────────────────┬────────────╮
│ Index Name           │ Storage Type   │ Prefixes   │ Index Options   │   Indexing │
├──────────────────────┼────────────────┼────────────┼─────────────────┼────────────┤
│ semantic_search_demo │ HASH           │ ['rvl']    │ []              │          0 │
╰──────────────────────┴────────────────┴────────────┴─────────────────┴────────────╯
Index Fields:
╭──────────────┬──────────────┬─────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬────────────────┬─────────────────┬────────────────╮
│ Name         │ Attribute    │ Type    │ Field Option   │ Option Value   │ Field Option   │ Option Value   │ Field Option   │   Option Value │ Field Option    │ Option Value   │
├──────────────┼──────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────────────────┼────────────────┤
│ url          │ url          │ TEXT    │ WEIGHT         │ 1              │                │                │                │                │                 │                │
│ title        │ title        │ TEXT    │ WEIGHT         │ 1              │                │                │                │                │                 │                │
│ text         │ text         │ TEXT    │ WEIGHT         │ 1              │                │                │                │                │                 │                │
│ wiki_id      │ wiki_id      │ NUMERIC │                │                │                │                │                │                │                 │                │
│ paragraph_id │ paragraph_id │ NUMERIC │                │                │                │                │                │                │                 │                │
│ id           │ id           │ NUMERIC │                │                │                │                │                │                │                 │                │
│ views        │ views        │ NUMERIC │                │                │                │                │                │                │                 │                │
│ langs        │ langs        │ NUMERIC │                │                │                │                │                │                │                 │                │
│ embedding    │ embedding    │ VECTOR  │ algorithm      │ FLAT           │ data_type      │ FLOAT32        │ dim            │           1024 │ distance_metric │ COSINE         │
╰──────────────┴──────────────┴─────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴────────────────┴─────────────────┴────────────────╯

You can also visit: Localhost Redis GUI. The Redis GUI will show you the index in realtime.

Loading your Documents and Embedding them into Redis:

PYTHON

1 # read in your documents
2 jsonl_file_path = "data/redis_guide_data.jsonl"
3 
4 corpus = []
5 text_to_embed = []
6 
7 with jsonlines.open(jsonl_file_path, mode="r") as reader:
8     for line in reader:
9         corpus.append(line)
10         # we want to store the embeddings of the field called `text`
11         text_to_embed.append(line["text"])
12 
13 # call embed_many which returns an array
14 # hash data structures get serialized as a string and thus we store the embeddings in hashes as a byte string (handled by numpy)
15 res = cohere_vectorizer.embed_many(
16     text_to_embed, input_type="search_document", as_buffer=True
17 )

We will be loading a subset of data which contains paragraphs from wikipedia - the data lives in a jsonl and we will need to parse it to get the text field which is what we are embedding. To do this, we load the file and read it line-by-line, creating a corpus object and a text_to_embed object. We then pass the text_to_embed object into co.embed_many which takes in an list of strings.

Prepare your Data to be inserted into the Index:

PYTHON

1 # contruct the data payload to be uploaded to your index
2 data = [
3     {
4         "url": row["url"],
5         "title": row["title"],
6         "text": row["text"],
7         "wiki_id": row["wiki_id"],
8         "paragraph_id": row["paragraph_id"],
9         "id": row["id"],
10         "views": row["views"],
11         "langs": row["langs"],
12         "embedding": v,
13     }
14     for row, v in zip(corpus, res)
15 ]
16 
17 # load the data into your index
18 index.load(data)

We want to preserve all the meta-data for each paragraph into our table and create a list of dictionaries which is inserted into the index

At this point, your Redis DB is ready for semantic search!

Query your Redis DB:

PYTHON

1 # use the Cohere vectorizer again to create a query embedding
2 query_embedding = cohere_vectorizer.embed(
3     "What did Microsoft release in 2015?",
4     input_type="search_query",
5     as_buffer=True,
6 )
7 
8 
9 query = VectorQuery(
10     vector=query_embedding,
11     vector_field_name="embedding",
12     return_fields=[
13         "url",
14         "wiki_id",
15         "paragraph_id",
16         "id",
17         "views",
18         "langs",
19         "title",
20         "text",
21     ],
22     num_results=5,
23 )
24 
25 results = index.query(query)
26 
27 for doc in results:
28     print(
29         f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\n\n"
30     )

Use the VectorQuery class to construct a query object - here you can specify the fields you’d like Redis to return as well as the number of results (i.e. for this example we set it to 5).

Redis Filters

Adding Tag Filters:

PYTHON

1 # Initialize a tag filter
2 tag_filter = Tag("title") == "Microsoft Office"
3 
4 # set the tag filter on our existing query
5 query.set_filter(tag_filter)
6 
7 results = index.query(query)
8 
9 for doc in results:
10     print(
11         f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\n"
12     )

One feature of Redis is the ability to add filtering to your queries on the fly. Here we are constructing a tag filter on the column title which was initialized in our schema with type=tag.

Using Filter Expressions:

PYTHON

1 # define a tag match on the title, text match on the text field, and numeric filter on the views field
2 filter_data = (
3     (Tag("title") == "Elizabeth II")
4     & (Text("text") % "born")
5     & (Num("views") > 4500)
6 )
7 
8 query_embedding = co.embed(
9     "When was she born?", input_type="search_query", as_buffer=True
10 )
11 
12 # reinitialize the query with the filter expression
13 query = VectorQuery(
14     vector=query_embedding,
15     vector_field_name="embedding",
16     return_fields=[
17         "url",
18         "wiki_id",
19         "paragraph_id",
20         "id",
21         "views",
22         "langs",
23         "title",
24         "text",
25     ],
26     num_results=5,
27     filter_expression=filter_data,
28 )
29 
30 results = index.query(query)
31 print(results)
32 
33 for doc in results:
34     print(
35         f"Title:{doc['title']}\nText:{doc['text']}\nDistance {doc['vector_distance']}\nView {doc['views']}"
36     )

Another feature of Redis is the ability to initialize a query with a set of filters called a filter expression. A filter expression allows for the you to combine a set of filters over an arbitrary set of fields at query time.