Hello World! Explore Language AI with Cohere

Back to Cookbooks Open in GitHub

Here we take a quick tour of what’s possible with language AI via Cohere’s Large Language Model (LLM) API. This is the Hello, World! of language AI, written for developers with little or no background in AI. In fact, we’ll do that by exploring the Hello, World! phrase itself.

Read the accompanying blog post here.

We’ll cover three groups of tasks that you will typically work on when dealing with language data, including:

Generating text
Classifying text
Analyzing text

The first step is to install the Cohere Python SDK. Next, create an API key, which you can generate from the Cohere dashboard.

PYTHON

1 ! pip install cohere altair umap-learn -q

PYTHON

1 import cohere
2 import pandas as pd
3 import numpy as np
4 import altair as alt
5 
6 co = cohere.Client("COHERE_API_KEY") # Get your API key: https://dashboard.cohere.com/api-keys

The Cohere Generate endpoint generates text given an input, called “prompt”. The prompt provides a context of what we want the model to generate text. To illustrate this, let’s start with a simple prompt as the input.

Try a Simple Prompt

PYTHON

1 prompt = "What is a Hello World program."
2 
3 response = co.chat(
4   message=prompt,
5   model='command-r')
6 
7 print(response.text)

A "Hello World" program is a traditional and simple program that is often used as an introduction to a new programming language. The program typically displays the message "Hello World" as its output. The concept of a "Hello World" program originated from the book *The C Programming Language* written by Kernighan and Ritchie, where the example program in the book displayed the message using the C programming language.
The "Hello World" program serves as a basic and straightforward way to verify that your development environment is set up correctly and to familiarize yourself with the syntax and fundamentals of the programming language. It's a starting point for learning how to write and run programs in a new language.
The program's simplicity makes it accessible to programmers of all skill levels, and it's often one of the first programs beginners write when learning to code. The exact implementation of a "Hello World" program varies depending on the programming language being used, but the core idea remains the same—to display the "Hello World" message.
Here's how a "Hello World" program can be written in a few select languages:
1. **C**:
```c
#include <stdio.h>
int main() {
    printf("Hello World\n");
    return 0;
}
```
2. **Python**:
```python PYTHON
print("Hello World")
```
3. **Java**:
```java JAVA
class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}
```
4. **JavaScript**:
```javascript
console.log("Hello World");
```
5. **C#**:
```csharp
using System;
class Program {
    static void Main() {
        Console.WriteLine("Hello World");
    }
}
```
The "Hello World" program is a testament to the power of programming, as a simple and concise message can be displayed in numerous languages with just a few lines of code. It's an exciting first step into the world of software development!

Create a Better Prompt

The output is not bad, but it can be better. We need to find a way to make the output tighter to how we want it to be, which is where we leverage prompt engineering.

PYTHON

1 prompt = """
2 Write the first paragraph of a blog post given a blog title.
3 --
4 Blog Title: Best Activities in Toronto
5 First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
6 largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
7 visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
8 this blog post, I'll share some of my favorite recommendations
9 --
10 Blog Title: Mastering Dynamic Programming
11 First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
12 and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
13 solve dynamic programming problems.
14 --
15 Blog Title: Learning to Code with Hello, World!
16 First Paragraph:"""
17 
18 response = co.chat(
19   message=prompt,
20   model='command-r')
21 
22 print(response.text)

Starting to code can be daunting, but it's actually simpler than you think! The famous first program, "Hello, World!" is a rite of passage for all coders, and an excellent starting point to begin your coding journey. This blog will guide you through the process of writing your very first line of code, and help you understand why learning to code is an exciting and valuable skill to have, covering the fundamentals and the broader implications of this seemingly simple phrase.

Automating the Process

In real applications, you will likely need to produce these text generations on an ongoing basis, given different inputs. Let’s simulate that with our example.

PYTHON

1 def generate_text(topic):
2   prompt = f"""
3 Write the first paragraph of a blog post given a blog title.
4 --
5 Blog Title: Best Activities in Toronto
6 First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
7 largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
8 visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
9 this blog post, I'll share some of my favorite recommendations
10 --
11 Blog Title: Mastering Dynamic Programming
12 First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
13 and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
14 solve dynamic programming problems.
15 --
16 Blog Title: {topic}
17 First Paragraph:"""
18   # Generate text by calling the Chat endpoint
19   response = co.chat(
20     message=prompt,
21     model='command-r')
22 
23   return response.text

PYTHON

1 topics = ["How to Grow in Your Career",
2           "The Habits of Great Software Developers",
3           "Ideas for a Relaxing Weekend"]

PYTHON

1 paragraphs = []
2 
3 for topic in topics:
4   paragraphs.append(generate_text(topic))
5 
6 for topic,para in zip(topics,paragraphs):
7   print(f"Topic: {topic}")
8   print(f"First Paragraph: {para}")
9   print("-"*10)

Topic: How to Grow in Your Career
First Paragraph: Advancing in your career can seem like a daunting task, especially if you're unsure of the path ahead. In this ever-changing professional landscape, there are numerous factors to consider. This blog aims to shed light on the strategies and skills that can help you navigate the complexities of career progression and unlock your full potential. Whether you're looking to secure a promotion or explore new opportunities, these insights will help you chart a course for your future. Let's embark on this journey of self-improvement and professional growth, equipping you with the tools to succeed in your career aspirations.
----------
Topic: The Habits of Great Software Developers
First Paragraph: Great software developers are renowned for their ability to write robust code and create innovative applications, but what sets them apart from their peers? In this blog, we'll delve into the daily habits that contribute to their success. From their approach to coding challenges to the ways they stay organized, we'll explore the routines and practices that help them excel in the fast-paced world of software development. Understanding these habits can help you elevate your own skills and join the ranks of these industry leaders.
----------
Topic: Ideas for a Relaxing Weekend
First Paragraph: Life can be stressful, and sometimes we just need a relaxing weekend to unwind and recharge. In this fast-paced world, taking some time to slow down and rejuvenate is essential. This blog post is here to help you plan the perfect low-key weekend with some easy and accessible ideas. From cozy indoor activities to peaceful outdoor adventures, I'll share some ideas to help you renew your mind, body, and spirit. Whether you're a homebody or an adventure seeker, there's something special for everyone. So, grab a cup of tea, sit back, and get ready to dive into a calming weekend of self-care and relaxation!
----------

Cohere’s Classify endpoint makes it easy to take a list of texts and predict their categories, or classes. A typical machine learning model requires many training examples to perform text classification, but with the Classify endpoint, you can get started with as few as 5 examples per class.

Sentiment Analysis

PYTHON

1 from cohere import ClassifyExample
2 
3 examples = [
4     ClassifyExample(text="I’m so proud of you", label="positive"),
5     ClassifyExample(text="What a great time to be alive", label="positive"),
6     ClassifyExample(text="That’s awesome work", label="positive"),
7     ClassifyExample(text="The service was amazing", label="positive"),
8     ClassifyExample(text="I love my family", label="positive"),
9     ClassifyExample(text="They don't care about me", label="negative"),
10     ClassifyExample(text="I hate this place", label="negative"),
11     ClassifyExample(text="The most ridiculous thing I've ever heard", label="negative"),
12     ClassifyExample(text="I am really frustrated", label="negative"),
13     ClassifyExample(text="This is so unfair", label="negative"),
14     ClassifyExample(text="This made me think", label="neutral"),
15     ClassifyExample(text="The good old days", label="neutral"),
16     ClassifyExample(text="What's the difference", label="neutral"),
17     ClassifyExample(text="You can't ignore this", label="neutral"),
18     ClassifyExample(text="That's how I see it", label="neutral")
19 ]

PYTHON

1 inputs=["Hello, world! What a beautiful day",
2         "It was a great time with great people",
3         "Great place to work",
4         "That was a wonderful evening",
5         "Maybe this is why",
6         "Let's start again",
7         "That's how I see it",
8         "These are all facts",
9         "This is the worst thing",
10         "I cannot stand this any longer",
11         "This is really annoying",
12         "I am just plain fed up"
13         ]

PYTHON

1 def classify_text(inputs, examples):
2   """
3   Classify a list of input texts
4   Arguments:
5     inputs(list[str]): a list of input texts to be classified
6     examples(list[Example]): a list of example texts and class labels
7   Returns:
8     classifications(list): each result contains the text, labels, and conf values
9   """
10   # Classify text by calling the Classify endpoint
11   response = co.classify(
12     model='embed-v4.0',
13     inputs=inputs,
14     examples=examples)
15 
16   classifications = response.classifications
17 
18   return classifications

PYTHON

1 predictions = classify_text(inputs,examples)
2 
3 classes = ["positive","negative","neutral"]
4 for inp,pred in zip(inputs,predictions):
5   class_pred = pred.predictions[0]
6   class_idx = classes.index(class_pred)
7   class_conf = pred.confidences[0]
8 
9   print(f"Input: {inp}")
10   print(f"Prediction: {class_pred}")
11   print(f"Confidence: {class_conf:.2f}")
12   print("-"*10)

Input: Hello, world! What a beautiful day
Prediction: positive
Confidence: 0.84
----------
Input: It was a great time with great people
Prediction: positive
Confidence: 0.99
----------
Input: Great place to work
Prediction: positive
Confidence: 0.91
----------
Input: That was a wonderful evening
Prediction: positive
Confidence: 0.96
----------
Input: Maybe this is why
Prediction: neutral
Confidence: 0.70
----------
Input: Let's start again
Prediction: neutral
Confidence: 0.83
----------
Input: That's how I see it
Prediction: neutral
Confidence: 1.00
----------
Input: These are all facts
Prediction: neutral
Confidence: 0.78
----------
Input: This is the worst thing
Prediction: negative
Confidence: 0.93
----------
Input: I cannot stand this any longer
Prediction: negative
Confidence: 0.93
----------
Input: This is really annoying
Prediction: negative
Confidence: 0.99
----------
Input: I am just plain fed up
Prediction: negative
Confidence: 1.00
----------

Cohere’s Embed endpoint takes a piece of text and turns it into a vector embedding. Embeddings represent text in the form of numbers that capture its meaning and context. What it means is that it gives you the ability to turn unstructured text data into a structured form. It opens up ways to analyze and extract insights from them.

Get embeddings

Here we have a list of 50 top web search keywords about Hello, World! taken from a keyword tool. Let’s look at a few examples:

PYTHON

1 df = pd.read_csv("https://github.com/cohere-ai/cohere-developer-experience/raw/main/notebooks/data/hello-world-kw.csv", names=["search_term"])
2 df.head()

	search_term
0	how to print hello world in python
1	what is hello world
2	how do you write hello world in an alert box
3	how to print hello world in java
4	how to write hello world in eclipse

We use the Embed endpoint to get the embeddings for each of these keywords.

PYTHON

1 def embed_text(texts, input_type):
2   """
3   Turns a piece of text into embeddings
4   Arguments:
5     text(str): the text to be turned into embeddings
6   Returns:
7     embedding(list): the embeddings
8   """
9   # Embed text by calling the Embed endpoint
10   response = co.embed(
11                 model="embed-v4.0",
12                 input_type=input_type,
13                 texts=texts)
14 
15   return response.embeddings

PYTHON

1 df["search_term_embeds"] = embed_text(texts=df["search_term"].tolist(),
2                                       input_type="search_document")
3 doc_embeds = np.array(df["search_term_embeds"].tolist())

Semantic Search

We’ll look at a couple of example applications. The first example is semantic search. Given a new query, our “search engine” must return the most similar FAQs, where the FAQs are the 50 search terms we uploaded earlier.

PYTHON

1 query = "what is the history of hello world"
2 
3 query_embeds = embed_text(texts=[query],
4                           input_type="search_query")[0]

We use cosine similarity to compare the similarity of the new query with each of the FAQs

PYTHON

1 from sklearn.metrics.pairwise import cosine_similarity
2 
3 def get_similarity(target, candidates):
4   """
5   Computes the similarity between a target text and a list of other texts
6   Arguments:
7     target(list[float]): the target text
8     candidates(list[list[float]]): a list of other texts, or candidates
9   Returns:
10     sim(list[tuple]): candidate IDs and the similarity scores
11   """
12   # Turn list into array
13   candidates = np.array(candidates)
14   target = np.expand_dims(np.array(target),axis=0)
15 
16   # Calculate cosine similarity
17   sim = cosine_similarity(target,candidates)
18   sim = np.squeeze(sim).tolist()
19 
20   # Sort by descending order in similarity
21   sim = list(enumerate(sim))
22   sim = sorted(sim, key=lambda x:x[1], reverse=True)
23 
24   # Return similarity scores
25   return sim

Finally, we display the top 5 FAQs that match the new query

PYTHON

1 similarity = get_similarity(query_embeds,doc_embeds)
2 
3 print("New query:")
4 print(new_query,'\n')
5 
6 print("Similar queries:")
7 for idx,score in similarity[:5]:
8   print(f"Similarity: {score:.2f};", df.iloc[idx]["search_term"])

New query:
what is the history of hello world
Similar queries:
Similarity: 0.58; how did hello world originate
Similarity: 0.56; where did hello world come from
Similarity: 0.54; why hello world
Similarity: 0.53; why is hello world so famous
Similarity: 0.53; what is hello world

Semantic Exploration

In the second example, we take the same idea as semantic search and take a broader look, which is exploring huge volumes of text and analyzing their semantic relationships.

We’ll use the same 50 top web search terms about Hello, World! There are different techniques we can use to compress the embeddings down to just 2 dimensions while retaining as much information as possible. We’ll use a technique called UMAP. And once we can get it down to 2 dimensions, we can plot these embeddings on a 2D chart.

PYTHON

1 import umap
2 reducer = umap.UMAP(n_neighbors=49)
3 umap_embeds = reducer.fit_transform(doc_embeds)
4 
5 df['x'] = umap_embeds[:,0]
6 df['y'] = umap_embeds[:,1]

PYTHON

1 chart = alt.Chart(df).mark_circle(size=500).encode(
2   x=
3   alt.X('x',
4       scale=alt.Scale(zero=False),
5       axis=alt.Axis(labels=False, ticks=False, domain=False)
6   ),
7 
8   y=
9   alt.Y('y',
10       scale=alt.Scale(zero=False),
11       axis=alt.Axis(labels=False, ticks=False, domain=False)
12   ),
13 
14   tooltip=['search_term']
15   )
16 
17 text = chart.mark_text(align='left', dx=15, size=12, color='black'
18           ).encode(text='search_term', color= alt.value('black'))
19 
20 result = (chart + text).configure(background="#FDF7F0"
21       ).properties(
22       width=1000,
23       height=700,
24       title="2D Embeddings"
25       )
26 
27 result