This notebook shows how to build a classifier using Cohere’s embeddings.
The example classification task here will be sentiment analysis of film reviews. We’ll train a simple classifier to detect whether a film review is negative (class 0) or positive (class 1).
We’ll go through the following steps:
If you’re running an older version of the SDK you’ll want to upgrade it, like this:
We’ll only use a subset of the training and testing datasets in this example. We’ll only use 500 examples since this is a toy example. You’ll want to increase the number to get better performance and evaluation.
The train_test_split method splits arrays or matrices into random train and test subsets.
We’re now ready to retrieve the embeddings from the API. You’ll need your API key for this next cell. Sign up to Cohere and get one if you haven’t yet.
Note that the ordering of the arguments is important. If you put input_type in before model_name, you’ll get an error.
We now have two sets of embeddings, embeddings_train contains the embeddings of the training sentences while embeddings_test contains the embeddings of the testing sentences.
Curious what an embedding looks like? We can print it:
Now that we have the embedding, we can train our classifier. We’ll use an SVM from sklearn.
You may get a slightly different number when you run this code.
This was a small scale example, meant as a proof of concept and designed to illustrate how you can build a custom classifier quickly using a small amount of labelled data and Cohere’s embeddings. Increase the number of training examples to achieve better performance on this task.