Text Classification Using Embeddings
This notebook shows how to build a classifier using Cohereโs embeddings.

The example classification task here will be sentiment analysis of film reviews. Weโll train a simple classifier to detect whether a film review is negative (class 0) or positive (class 1).
Weโll go through the following steps:
- Get the dataset
- Get the embeddings of the reviews (for both the training set and the test set).
- Train a classifier using the training set
- Evaluate the performance of the classifier on the testing set
If youโre running an older version of the SDK youโll want to upgrade it, like this:
1. Get the dataset
Weโll only use a subset of the training and testing datasets in this example. Weโll only use 500 examples since this is a toy example. Youโll want to increase the number to get better performance and evaluation.
The train_test_split
method splits arrays or matrices into random train and test subsets.
2. Set up the Cohere client and get the embeddings of the reviews
Weโre now ready to retrieve the embeddings from the API. Youโll need your API key for this next cell. Sign up to Cohere and get one if you havenโt yet.
Note that the ordering of the arguments is important. If you put input_type
in before model_name
, youโll get an error.
We now have two sets of embeddings, embeddings_train
contains the embeddings of the training sentences while embeddings_test
contains the embeddings of the testing sentences.
Curious what an embedding looks like? We can print it:
3. Train a classifier using the training set
Now that we have the embedding, we can train our classifier. Weโll use an SVM from sklearn.
4. Evaluate the performance of the classifier on the testing set
You may get a slightly different number when you run this code.
This was a small scale example, meant as a proof of concept and designed to illustrate how you can build a custom classifier quickly using a small amount of labelled data and Cohereโs embeddings. Increase the number of training examples to achieve better performance on this task.