Hello World! Meet Language AI

Here we take a quick tour of what’s possible with language AI via Cohere’s Large Language Model (LLM) API. This is the Hello, World! of language AI, written for developers with little or no background in AI. In fact, we’ll do that by exploring the Hello, World! phrase itself.

Read the accompanying blog post here.

Hello World! Meet Language AI

We’ll cover three groups of tasks that you will typically work on when dealing with language data, including:

  • Generating text
  • Classifying text
  • Analyzing text

The first step is to install the Cohere Python SDK. Next, create an API key, which you can generate from the Cohere dashboard or CLI tool.

PYTHON
1! pip install cohere altair umap-learn -q
PYTHON
1import cohere
2import pandas as pd
3import numpy as np
4import altair as alt
5
6co = cohere.Client("COHERE_API_KEY") # Get your API key: https://dashboard.cohere.com/api-keys

The Cohere Generate endpoint generates text given an input, called “prompt”. The prompt provides a context of what we want the model to generate text. To illustrate this, let’s start with a simple prompt as the input.

Try a Simple Prompt

PYTHON
1prompt = "What is a Hello World program."
2
3response = co.chat(
4 message=prompt,
5 model='command-r')
6
7print(response.text)
A "Hello World" program is a traditional and simple program that is often used as an introduction to a new programming language. The program typically displays the message "Hello World" as its output. The concept of a "Hello World" program originated from the book *The C Programming Language* written by Kernighan and Ritchie, where the example program in the book displayed the message using the C programming language.
The "Hello World" program serves as a basic and straightforward way to verify that your development environment is set up correctly and to familiarize yourself with the syntax and fundamentals of the programming language. It's a starting point for learning how to write and run programs in a new language.
The program's simplicity makes it accessible to programmers of all skill levels, and it's often one of the first programs beginners write when learning to code. The exact implementation of a "Hello World" program varies depending on the programming language being used, but the core idea remains the same—to display the "Hello World" message.
Here's how a "Hello World" program can be written in a few select languages:
1. **C**:
```c
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
```
2. **Python**:
```python PYTHON
print("Hello World")
```
3. **Java**:
```java JAVA
class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World");
}
}
```
4. **JavaScript**:
```javascript
console.log("Hello World");
```
5. **C#**:
```csharp
using System;
class Program {
static void Main() {
Console.WriteLine("Hello World");
}
}
```
The "Hello World" program is a testament to the power of programming, as a simple and concise message can be displayed in numerous languages with just a few lines of code. It's an exciting first step into the world of software development!

Create a Better Prompt

The output is not bad, but it can be better. We need to find a way to make the output tighter to how we want it to be, which is where we leverage prompt engineering.

PYTHON
1prompt = """
2Write the first paragraph of a blog post given a blog title.
3--
4Blog Title: Best Activities in Toronto
5First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
6largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
7visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
8this blog post, I'll share some of my favorite recommendations
9--
10Blog Title: Mastering Dynamic Programming
11First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
12and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
13solve dynamic programming problems.
14--
15Blog Title: Learning to Code with Hello, World!
16First Paragraph:"""
17
18response = co.chat(
19 message=prompt,
20 model='command-r')
21
22print(response.text)
Starting to code can be daunting, but it's actually simpler than you think! The famous first program, "Hello, World!" is a rite of passage for all coders, and an excellent starting point to begin your coding journey. This blog will guide you through the process of writing your very first line of code, and help you understand why learning to code is an exciting and valuable skill to have, covering the fundamentals and the broader implications of this seemingly simple phrase.

Automating the Process

In real applications, you will likely need to produce these text generations on an ongoing basis, given different inputs. Let’s simulate that with our example.

PYTHON
1def generate_text(topic):
2 prompt = f"""
3Write the first paragraph of a blog post given a blog title.
4--
5Blog Title: Best Activities in Toronto
6First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
7largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
8visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
9this blog post, I'll share some of my favorite recommendations
10--
11Blog Title: Mastering Dynamic Programming
12First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
13and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
14solve dynamic programming problems.
15--
16Blog Title: {topic}
17First Paragraph:"""
18 # Generate text by calling the Chat endpoint
19 response = co.chat(
20 message=prompt,
21 model='command-r')
22
23 return response.text
PYTHON
1topics = ["How to Grow in Your Career",
2 "The Habits of Great Software Developers",
3 "Ideas for a Relaxing Weekend"]
PYTHON
1paragraphs = []
2
3for topic in topics:
4 paragraphs.append(generate_text(topic))
5
6for topic,para in zip(topics,paragraphs):
7 print(f"Topic: {topic}")
8 print(f"First Paragraph: {para}")
9 print("-"*10)
Topic: How to Grow in Your Career
First Paragraph: Advancing in your career can seem like a daunting task, especially if you're unsure of the path ahead. In this ever-changing professional landscape, there are numerous factors to consider. This blog aims to shed light on the strategies and skills that can help you navigate the complexities of career progression and unlock your full potential. Whether you're looking to secure a promotion or explore new opportunities, these insights will help you chart a course for your future. Let's embark on this journey of self-improvement and professional growth, equipping you with the tools to succeed in your career aspirations.
----------
Topic: The Habits of Great Software Developers
First Paragraph: Great software developers are renowned for their ability to write robust code and create innovative applications, but what sets them apart from their peers? In this blog, we'll delve into the daily habits that contribute to their success. From their approach to coding challenges to the ways they stay organized, we'll explore the routines and practices that help them excel in the fast-paced world of software development. Understanding these habits can help you elevate your own skills and join the ranks of these industry leaders.
----------
Topic: Ideas for a Relaxing Weekend
First Paragraph: Life can be stressful, and sometimes we just need a relaxing weekend to unwind and recharge. In this fast-paced world, taking some time to slow down and rejuvenate is essential. This blog post is here to help you plan the perfect low-key weekend with some easy and accessible ideas. From cozy indoor activities to peaceful outdoor adventures, I'll share some ideas to help you renew your mind, body, and spirit. Whether you're a homebody or an adventure seeker, there's something special for everyone. So, grab a cup of tea, sit back, and get ready to dive into a calming weekend of self-care and relaxation!
----------

Cohere’s Classify endpoint makes it easy to take a list of texts and predict their categories, or classes. A typical machine learning model requires many training examples to perform text classification, but with the Classify endpoint, you can get started with as few as 5 examples per class.

Sentiment Analysis

PYTHON
1from cohere import ClassifyExample
2
3examples = [
4 ClassifyExample(text="I’m so proud of you", label="positive"),
5 ClassifyExample(text="What a great time to be alive", label="positive"),
6 ClassifyExample(text="That’s awesome work", label="positive"),
7 ClassifyExample(text="The service was amazing", label="positive"),
8 ClassifyExample(text="I love my family", label="positive"),
9 ClassifyExample(text="They don't care about me", label="negative"),
10 ClassifyExample(text="I hate this place", label="negative"),
11 ClassifyExample(text="The most ridiculous thing I've ever heard", label="negative"),
12 ClassifyExample(text="I am really frustrated", label="negative"),
13 ClassifyExample(text="This is so unfair", label="negative"),
14 ClassifyExample(text="This made me think", label="neutral"),
15 ClassifyExample(text="The good old days", label="neutral"),
16 ClassifyExample(text="What's the difference", label="neutral"),
17 ClassifyExample(text="You can't ignore this", label="neutral"),
18 ClassifyExample(text="That's how I see it", label="neutral")
19]
PYTHON
1inputs=["Hello, world! What a beautiful day",
2 "It was a great time with great people",
3 "Great place to work",
4 "That was a wonderful evening",
5 "Maybe this is why",
6 "Let's start again",
7 "That's how I see it",
8 "These are all facts",
9 "This is the worst thing",
10 "I cannot stand this any longer",
11 "This is really annoying",
12 "I am just plain fed up"
13 ]
PYTHON
1def classify_text(inputs, examples):
2 """
3 Classify a list of input texts
4 Arguments:
5 inputs(list[str]): a list of input texts to be classified
6 examples(list[Example]): a list of example texts and class labels
7 Returns:
8 classifications(list): each result contains the text, labels, and conf values
9 """
10 # Classify text by calling the Classify endpoint
11 response = co.classify(
12 model='embed-english-v2.0',
13 inputs=inputs,
14 examples=examples)
15
16 classifications = response.classifications
17
18 return classifications
PYTHON
1predictions = classify_text(inputs,examples)
2
3classes = ["positive","negative","neutral"]
4for inp,pred in zip(inputs,predictions):
5 class_pred = pred.predictions[0]
6 class_idx = classes.index(class_pred)
7 class_conf = pred.confidences[0]
8
9 print(f"Input: {inp}")
10 print(f"Prediction: {class_pred}")
11 print(f"Confidence: {class_conf:.2f}")
12 print("-"*10)
Input: Hello, world! What a beautiful day
Prediction: positive
Confidence: 0.84
----------
Input: It was a great time with great people
Prediction: positive
Confidence: 0.99
----------
Input: Great place to work
Prediction: positive
Confidence: 0.91
----------
Input: That was a wonderful evening
Prediction: positive
Confidence: 0.96
----------
Input: Maybe this is why
Prediction: neutral
Confidence: 0.70
----------
Input: Let's start again
Prediction: neutral
Confidence: 0.83
----------
Input: That's how I see it
Prediction: neutral
Confidence: 1.00
----------
Input: These are all facts
Prediction: neutral
Confidence: 0.78
----------
Input: This is the worst thing
Prediction: negative
Confidence: 0.93
----------
Input: I cannot stand this any longer
Prediction: negative
Confidence: 0.93
----------
Input: This is really annoying
Prediction: negative
Confidence: 0.99
----------
Input: I am just plain fed up
Prediction: negative
Confidence: 1.00
----------

Cohere’s Embed endpoint takes a piece of text and turns it into a vector embedding. Embeddings represent text in the form of numbers that capture its meaning and context. What it means is that it gives you the ability to turn unstructured text data into a structured form. It opens up ways to analyze and extract insights from them.

Get embeddings

Here we have a list of 50 top web search keywords about Hello, World! taken from a keyword tool. Let’s look at a few examples:

PYTHON
1df = pd.read_csv("https://github.com/cohere-ai/notebooks/raw/main/notebooks/data/hello-world-kw.csv", names=["search_term"])
2df.head()
search_term
0how to print hello world in python
1what is hello world
2how do you write hello world in an alert box
3how to print hello world in java
4how to write hello world in eclipse

We use the Embed endpoint to get the embeddings for each of these keywords.

PYTHON
1def embed_text(texts, input_type):
2 """
3 Turns a piece of text into embeddings
4 Arguments:
5 text(str): the text to be turned into embeddings
6 Returns:
7 embedding(list): the embeddings
8 """
9 # Embed text by calling the Embed endpoint
10 response = co.embed(
11 model="embed-english-v3.0",
12 input_type=input_type,
13 texts=texts)
14
15 return response.embeddings
PYTHON
1df["search_term_embeds"] = embed_text(texts=df["search_term"].tolist(),
2 input_type="search_document")
3doc_embeds = np.array(df["search_term_embeds"].tolist())

We’ll look at a couple of example applications. The first example is semantic search. Given a new query, our “search engine” must return the most similar FAQs, where the FAQs are the 50 search terms we uploaded earlier.

PYTHON
1query = "what is the history of hello world"
2
3query_embeds = embed_text(texts=[query],
4 input_type="search_query")[0]

We use cosine similarity to compare the similarity of the new query with each of the FAQs

PYTHON
1from sklearn.metrics.pairwise import cosine_similarity
2
3def get_similarity(target, candidates):
4 """
5 Computes the similarity between a target text and a list of other texts
6 Arguments:
7 target(list[float]): the target text
8 candidates(list[list[float]]): a list of other texts, or candidates
9 Returns:
10 sim(list[tuple]): candidate IDs and the similarity scores
11 """
12 # Turn list into array
13 candidates = np.array(candidates)
14 target = np.expand_dims(np.array(target),axis=0)
15
16 # Calculate cosine similarity
17 sim = cosine_similarity(target,candidates)
18 sim = np.squeeze(sim).tolist()
19
20 # Sort by descending order in similarity
21 sim = list(enumerate(sim))
22 sim = sorted(sim, key=lambda x:x[1], reverse=True)
23
24 # Return similarity scores
25 return sim

Finally, we display the top 5 FAQs that match the new query

PYTHON
1similarity = get_similarity(query_embeds,doc_embeds)
2
3print("New query:")
4print(new_query,'\n')
5
6print("Similar queries:")
7for idx,score in similarity[:5]:
8 print(f"Similarity: {score:.2f};", df.iloc[idx]["search_term"])
New query:
what is the history of hello world
Similar queries:
Similarity: 0.58; how did hello world originate
Similarity: 0.56; where did hello world come from
Similarity: 0.54; why hello world
Similarity: 0.53; why is hello world so famous
Similarity: 0.53; what is hello world

Semantic Exploration

In the second example, we take the same idea as semantic search and take a broader look, which is exploring huge volumes of text and analyzing their semantic relationships.

We’ll use the same 50 top web search terms about Hello, World! There are different techniques we can use to compress the embeddings down to just 2 dimensions while retaining as much information as possible. We’ll use a technique called UMAP. And once we can get it down to 2 dimensions, we can plot these embeddings on a 2D chart.

PYTHON
1import umap
2reducer = umap.UMAP(n_neighbors=49)
3umap_embeds = reducer.fit_transform(doc_embeds)
4
5df['x'] = umap_embeds[:,0]
6df['y'] = umap_embeds[:,1]
PYTHON
1chart = alt.Chart(df).mark_circle(size=500).encode(
2 x=
3 alt.X('x',
4 scale=alt.Scale(zero=False),
5 axis=alt.Axis(labels=False, ticks=False, domain=False)
6 ),
7
8 y=
9 alt.Y('y',
10 scale=alt.Scale(zero=False),
11 axis=alt.Axis(labels=False, ticks=False, domain=False)
12 ),
13
14 tooltip=['search_term']
15 )
16
17text = chart.mark_text(align='left', dx=15, size=12, color='black'
18 ).encode(text='search_term', color= alt.value('black'))
19
20result = (chart + text).configure(background="#FDF7F0"
21 ).properties(
22 width=1000,
23 height=700,
24 title="2D Embeddings"
25 )
26
27result