Text Classification (Classify)

📘

This Guide Uses the Classify Endpoint.

You can find more information about the endpoint here.

In this section, we show how to use the Classify endpoint to do sentiment classification for customer satisfaction survey responses an e-commerce website may receive.

The Problem We Want to Solve

For this demo, let's assume that we want to classify a set of reviews for a newly released feature into positive and negative classes. We might for instance have a review like this:

The item exceeded my expectations

that we want to classify as a positive review.

Naturally, the same techniques that we'll use for this problem can be used for any other task where we want to classify a given text according to a fixed set of classes.

Using Classify For Our Task

Classify takes in example inputs with their labels, as well as the input texts we aim to classify. It then trains a classifier using the power of an embeddings model.

Examples

Labeled examples are used to demonstrate the classification task to the model so it grasps the task. Examples provide two important pieces of information:
1- The inputs and expected outputs for the task we're interested in.
2- The number of output classes. Every class should appear in at least one example in the labeled examples.

In this case we will be passing in the following examples:

"The order is 5 days late" - negative
"The order came 5 days early" - positive
"I would recommend this to others" - positive
"The package was damaged" - negative
"The order was incorrrect" - negative
"The item exceeded my expecatations" - positive
"I want to return my item" - negative
"I ordered more for my friends" - positive
"The item's material feels low quality" - negative

Texts:

These are the input texts that we would like to classify:

"This item was broken when it arrived"
"This item broke after 3 weeks"

Classifying the Input Texts

Adding everything above together, we can call the API with the following arguments:

  • model: embed-english-v2.0
  • examples: [Example("The order is 5 days late","negative"), Example("The order came 5 days early","positive"), Example("I would recommend this to others","positive"), Example("The package was damaged","negative"), Example("The order was incorrect","negative"), Example("The item exceeded my expectations","positive"), Example("I want to return my item","negative"), Example("I ordered more for my friends","positive"), Example("The item's material feels low quality","negative")]
  • inputs: ["This item was broken when it arrived","This item broke after 3 weeks"]

The corresponding code snippet for the API call is as follows.

import cohere
from cohere.classify import Example
co = cohere.Client('{apiKey}')
classifications = co.classify(
  model='embed-english-v2.0',
  inputs=["This item was broken when it arrived", "This item broke after 3 weeks"],
  examples=[Example("The order came 5 days early", "positive"), Example("The item exceeded my expectations", "positive"), Example("I ordered more for my friends", "positive"), Example("I would buy this again", "positive"), Example("I would recommend this to others", "positive"), Example("The package was damaged", "negative"), Example("The order is 5 days late", "negative"), Example("The order was incorrect", "negative"), Example("I want to return my item", "negative"), Example("The item\'s material feels low quality", "negative")])
print('The confidence levels of the labels are: {}'.format(
       classifications.classifications))

It gives us the following values:

"results": [
        {
            "text": "This item was broken when it arrived",
            "prediction": "negative",
            "confidences": [
                {
                    "option": "negative",
                    "confidence": 0.99564105
                },
                {
                    "option": "positive",
                    "confidence": 0.0043589203
                }
            ]
        },
        {
            "text": "This item broke after 3 weeks",
            "prediction": "negative",
            "confidences": [
                {
                    "option": "negative",
                    "confidence": 0.99564105
                },
                {
                    "option": "positive",
                    "confidence": 0.0043589203
                }
            ]
        }
    ]

which indicates that, as we would expect, our model thinks that both texts are negative.


The playground has a user interface to help you set up the classification prompts, which can then be exported as code.

Choose Model

In the Cohere Playground, click on ‘Classify’.
Select the model size of your choice. Our smaller model is faster, while our larger model has a better grasp of language and are more able to capture and replicate the patterns in the input prompt.

Add Examples

Add your labeled examples in the ‘Examples’ section. The first column is for the examples while the subsequent columns are for the associated labels. Our example here consists of 2 labels but there is no limit as to how many labels you can specify, depending on the task you have.

You can also add your labeled examples using a CSV file by selecting upload your labelled examples.

Add at least 2 examples for each label. The more examples you have, the higher the chance of getting more accurate outcomes.

If you are not sure, there are also some preset examples to help you get started.

Add Inputs

Add the inputs you want to classify in the ‘Inputs’ section. Once done, click on ‘Classify’ to start the classification step.

View Results

Once the classification step is completed, you will see the output labels next to the inputs you added.

In the ‘Results’ section, you will also see the confidence levels associated with each output. The confidence level represents the model's degree of certainty that the query falls under a given label. The label with the highest confidence is chosen.

Export Code

Now the code is ready to be exported. Click on ‘Export code’ and you can choose to export from a few different options. You can use this to start integrating the current API configuration to your application.

Training

You can opt to train a model if you have a dataset of at least 250 labeled examples (500 or more for best results) with at least 5 examples per label.

A trained model can potentially lead to a better classification performance than a baseline model. See here to get an overview of what model training is about.

To train a model for your classification task, navigate to the models page and click on Create a custom model.

The subsequent steps are the same as how you would train a representation model. For this, follow the steps described here.