Hello World! Meet Language AI

Hello World! Meet Language AI

Here we take a quick tour of what’s possible with language AI via Cohere’s Large Language Model (LLM) API. This is the Hello, World! of language AI, written for developers with little or no background in AI. In fact, we’ll do that by exploring the Hello, World! phrase itself.

Read the accompanying blog post here.

Hello World! Meet Language AI

We’ll cover three groups of tasks that you will typically work on when dealing with language data, including:

  • Generating text
  • Classifying text
  • Analyzing text

The first step is to install the Cohere Python SDK. Next, create an API key, which you can generate from the Cohere dashboard or CLI tool.

! pip install cohere altair umap-learn -q
import cohere
import pandas as pd
import numpy as np
import altair as alt

co = cohere.Client("COHERE_API_KEY") # Get your API key: https://dashboard.cohere.com/api-keys

The Cohere Generate endpoint generates text given an input, called “prompt”. The prompt provides a context of what we want the model to generate text. To illustrate this, let’s start with a simple prompt as the input.

Try a Simple Prompt

prompt = "What is a Hello World program."

response = co.chat(
  message=prompt,
  model='command-r')

print(response.text)
A "Hello World" program is a traditional and simple program that is often used as an introduction to a new programming language. The program typically displays the message "Hello World" as its output. The concept of a "Hello World" program originated from the book *The C Programming Language* written by Kernighan and Ritchie, where the example program in the book displayed the message using the C programming language. 

The "Hello World" program serves as a basic and straightforward way to verify that your development environment is set up correctly and to familiarize yourself with the syntax and fundamentals of the programming language. It's a starting point for learning how to write and run programs in a new language.

The program's simplicity makes it accessible to programmers of all skill levels, and it's often one of the first programs beginners write when learning to code. The exact implementation of a "Hello World" program varies depending on the programming language being used, but the core idea remains the same—to display the "Hello World" message. 

Here's how a "Hello World" program can be written in a few select languages:
1. **C**:
```c
#include <stdio.h>
int main() {
    printf("Hello World\n");
    return 0;
}
```
2. **Python**:
```python
print("Hello World")
```
3. **Java**:
```java
class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}
```
4. **JavaScript**:
```javascript
console.log("Hello World");
```
5. **C#**:
```csharp
using System;

class Program {
    static void Main() {
        Console.WriteLine("Hello World");
    }
}
```
The "Hello World" program is a testament to the power of programming, as a simple and concise message can be displayed in numerous languages with just a few lines of code. It's an exciting first step into the world of software development!

Create a Better Prompt

The output is not bad, but it can be better. We need to find a way to make the output tighter to how we want it to be, which is where we leverage prompt engineering.

prompt = """
Write the first paragraph of a blog post given a blog title.
--
Blog Title: Best Activities in Toronto
First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
this blog post, I'll share some of my favorite recommendations
--
Blog Title: Mastering Dynamic Programming
First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
solve dynamic programming problems.
--
Blog Title: Learning to Code with Hello, World!
First Paragraph:"""

response = co.chat(
  message=prompt,
  model='command-r')

print(response.text)
Starting to code can be daunting, but it's actually simpler than you think! The famous first program, "Hello, World!" is a rite of passage for all coders, and an excellent starting point to begin your coding journey. This blog will guide you through the process of writing your very first line of code, and help you understand why learning to code is an exciting and valuable skill to have, covering the fundamentals and the broader implications of this seemingly simple phrase.

Automating the Process

In real applications, you will likely need to produce these text generations on an ongoing basis, given different inputs. Let’s simulate that with our example.

def generate_text(topic):
  prompt = f"""
Write the first paragraph of a blog post given a blog title.
--
Blog Title: Best Activities in Toronto
First Paragraph: Looking for fun things to do in Toronto? When it comes to exploring Canada's
largest city, there's an ever-evolving set of activities to choose from. Whether you're looking to
visit a local museum or sample the city's varied cuisine, there is plenty to fill any itinerary. In
this blog post, I'll share some of my favorite recommendations
--
Blog Title: Mastering Dynamic Programming
First Paragraph: In this piece, we'll help you understand the fundamentals of dynamic programming,
and when to apply this optimization technique. We'll break down bottom-up and top-down approaches to
solve dynamic programming problems.
--
Blog Title: {topic}
First Paragraph:"""
  # Generate text by calling the Chat endpoint
  response = co.chat(
    message=prompt,
    model='command-r')

  return response.text
topics = ["How to Grow in Your Career",
          "The Habits of Great Software Developers",
          "Ideas for a Relaxing Weekend"]
paragraphs = []

for topic in topics:
  paragraphs.append(generate_text(topic))
  
for topic,para in zip(topics,paragraphs):
  print(f"Topic: {topic}")
  print(f"First Paragraph: {para}")
  print("-"*10)
Topic: How to Grow in Your Career
First Paragraph: Advancing in your career can seem like a daunting task, especially if you're unsure of the path ahead. In this ever-changing professional landscape, there are numerous factors to consider. This blog aims to shed light on the strategies and skills that can help you navigate the complexities of career progression and unlock your full potential. Whether you're looking to secure a promotion or explore new opportunities, these insights will help you chart a course for your future. Let's embark on this journey of self-improvement and professional growth, equipping you with the tools to succeed in your career aspirations.
----------
Topic: The Habits of Great Software Developers
First Paragraph: Great software developers are renowned for their ability to write robust code and create innovative applications, but what sets them apart from their peers? In this blog, we'll delve into the daily habits that contribute to their success. From their approach to coding challenges to the ways they stay organized, we'll explore the routines and practices that help them excel in the fast-paced world of software development. Understanding these habits can help you elevate your own skills and join the ranks of these industry leaders.
----------
Topic: Ideas for a Relaxing Weekend
First Paragraph: Life can be stressful, and sometimes we just need a relaxing weekend to unwind and recharge. In this fast-paced world, taking some time to slow down and rejuvenate is essential. This blog post is here to help you plan the perfect low-key weekend with some easy and accessible ideas. From cozy indoor activities to peaceful outdoor adventures, I'll share some ideas to help you renew your mind, body, and spirit. Whether you're a homebody or an adventure seeker, there's something special for everyone. So, grab a cup of tea, sit back, and get ready to dive into a calming weekend of self-care and relaxation!
----------

Cohere’s Classify endpoint makes it easy to take a list of texts and predict their categories, or classes. A typical machine learning model requires many training examples to perform text classification, but with the Classify endpoint, you can get started with as few as 5 examples per class.

Sentiment Analysis

from cohere import ClassifyExample

examples = [
    ClassifyExample(text="I’m so proud of you", label="positive"), 
    ClassifyExample(text="What a great time to be alive", label="positive"), 
    ClassifyExample(text="That’s awesome work", label="positive"), 
    ClassifyExample(text="The service was amazing", label="positive"), 
    ClassifyExample(text="I love my family", label="positive"), 
    ClassifyExample(text="They don't care about me", label="negative"), 
    ClassifyExample(text="I hate this place", label="negative"), 
    ClassifyExample(text="The most ridiculous thing I've ever heard", label="negative"), 
    ClassifyExample(text="I am really frustrated", label="negative"), 
    ClassifyExample(text="This is so unfair", label="negative"),
    ClassifyExample(text="This made me think", label="neutral"), 
    ClassifyExample(text="The good old days", label="neutral"), 
    ClassifyExample(text="What's the difference", label="neutral"), 
    ClassifyExample(text="You can't ignore this", label="neutral"), 
    ClassifyExample(text="That's how I see it", label="neutral")
]
inputs=["Hello, world! What a beautiful day",
        "It was a great time with great people",
        "Great place to work",
        "That was a wonderful evening",
        "Maybe this is why",
        "Let's start again",
        "That's how I see it",
        "These are all facts",
        "This is the worst thing",
        "I cannot stand this any longer",
        "This is really annoying",
        "I am just plain fed up"
        ]
def classify_text(inputs, examples):
  """
  Classify a list of input texts
  Arguments:
    inputs(list[str]): a list of input texts to be classified
    examples(list[Example]): a list of example texts and class labels
  Returns:
    classifications(list): each result contains the text, labels, and conf values
  """
  # Classify text by calling the Classify endpoint
  response = co.classify(
    model='embed-english-v2.0',
    inputs=inputs,
    examples=examples)
  
  classifications = response.classifications
  
  return classifications
predictions = classify_text(inputs,examples)

classes = ["positive","negative","neutral"]
for inp,pred in zip(inputs,predictions):
  class_pred = pred.predictions[0]
  class_idx = classes.index(class_pred)
  class_conf = pred.confidences[0]

  print(f"Input: {inp}")
  print(f"Prediction: {class_pred}")
  print(f"Confidence: {class_conf:.2f}")
  print("-"*10)
Input: Hello, world! What a beautiful day
Prediction: positive
Confidence: 0.84
----------
Input: It was a great time with great people
Prediction: positive
Confidence: 0.99
----------
Input: Great place to work
Prediction: positive
Confidence: 0.91
----------
Input: That was a wonderful evening
Prediction: positive
Confidence: 0.96
----------
Input: Maybe this is why
Prediction: neutral
Confidence: 0.70
----------
Input: Let's start again
Prediction: neutral
Confidence: 0.83
----------
Input: That's how I see it
Prediction: neutral
Confidence: 1.00
----------
Input: These are all facts
Prediction: neutral
Confidence: 0.78
----------
Input: This is the worst thing
Prediction: negative
Confidence: 0.93
----------
Input: I cannot stand this any longer
Prediction: negative
Confidence: 0.93
----------
Input: This is really annoying
Prediction: negative
Confidence: 0.99
----------
Input: I am just plain fed up
Prediction: negative
Confidence: 1.00
----------

Cohere’s Embed endpoint takes a piece of text and turns it into a vector embedding. Embeddings represent text in the form of numbers that capture its meaning and context. What it means is that it gives you the ability to turn unstructured text data into a structured form. It opens up ways to analyze and extract insights from them.

Get embeddings

Here we have a list of 50 top web search keywords about Hello, World! taken from a keyword tool. Let’s look at a few examples:

df = pd.read_csv("https://github.com/cohere-ai/notebooks/raw/main/notebooks/data/hello-world-kw.csv", names=["search_term"])
df.head()
search_term
0 how to print hello world in python
1 what is hello world
2 how do you write hello world in an alert box
3 how to print hello world in java
4 how to write hello world in eclipse

We use the Embed endpoint to get the embeddings for each of these keywords.

def embed_text(texts, input_type):
  """
  Turns a piece of text into embeddings
  Arguments:
    text(str): the text to be turned into embeddings
  Returns:
    embedding(list): the embeddings
  """
  # Embed text by calling the Embed endpoint
  response = co.embed(
                model="embed-english-v3.0",
                input_type=input_type,
                texts=texts)
  
  return response.embeddings
df["search_term_embeds"] = embed_text(texts=df["search_term"].tolist(),
                                      input_type="search_document")
doc_embeds = np.array(df["search_term_embeds"].tolist())

Semantic Search

We’ll look at a couple of example applications. The first example is semantic search. Given a new query, our "search engine" must return the most similar FAQs, where the FAQs are the 50 search terms we uploaded earlier.

query = "what is the history of hello world"

query_embeds = embed_text(texts=[query],
                          input_type="search_query")[0]

We use cosine similarity to compare the similarity of the new query with each of the FAQs


from sklearn.metrics.pairwise import cosine_similarity

def get_similarity(target, candidates):
  """
  Computes the similarity between a target text and a list of other texts
  Arguments:
    target(list[float]): the target text
    candidates(list[list[float]]): a list of other texts, or candidates
  Returns:
    sim(list[tuple]): candidate IDs and the similarity scores
  """
  # Turn list into array
  candidates = np.array(candidates)
  target = np.expand_dims(np.array(target),axis=0)

  # Calculate cosine similarity
  sim = cosine_similarity(target,candidates)
  sim = np.squeeze(sim).tolist()

  # Sort by descending order in similarity
  sim = list(enumerate(sim))
  sim = sorted(sim, key=lambda x:x[1], reverse=True)

  # Return similarity scores
  return sim

Finally, we display the top 5 FAQs that match the new query

similarity = get_similarity(query_embeds,doc_embeds)

print("New query:")
print(new_query,'\n')

print("Similar queries:")
for idx,score in similarity[:5]:
  print(f"Similarity: {score:.2f};", df.iloc[idx]["search_term"])
New query:
what is the history of hello world 

Similar queries:
Similarity: 0.58; how did hello world originate
Similarity: 0.56; where did hello world come from
Similarity: 0.54; why hello world
Similarity: 0.53; why is hello world so famous
Similarity: 0.53; what is hello world

Semantic Exploration

In the second example, we take the same idea as semantic search and take a broader look, which is exploring huge volumes of text and analyzing their semantic relationships.

We'll use the same 50 top web search terms about Hello, World! There are different techniques we can use to compress the embeddings down to just 2 dimensions while retaining as much information as possible. We'll use a technique called UMAP. And once we can get it down to 2 dimensions, we can plot these embeddings on a 2D chart.

import umap
reducer = umap.UMAP(n_neighbors=49) 
umap_embeds = reducer.fit_transform(doc_embeds)

df['x'] = umap_embeds[:,0]
df['y'] = umap_embeds[:,1]
chart = alt.Chart(df).mark_circle(size=500).encode(
  x=
  alt.X('x',
      scale=alt.Scale(zero=False),
      axis=alt.Axis(labels=False, ticks=False, domain=False)
  ),

  y=
  alt.Y('y',
      scale=alt.Scale(zero=False),
      axis=alt.Axis(labels=False, ticks=False, domain=False)
  ),
  
  tooltip=['search_term']
  )

text = chart.mark_text(align='left', dx=15, size=12, color='black'
          ).encode(text='search_term', color= alt.value('black'))

result = (chart + text).configure(background="#FDF7F0"
      ).properties(
      width=1000,
      height=700,
      title="2D Embeddings"
      )

result