Migrating Monolithic Prompts to Command-R with RAG

Migrating Monolithic Prompts to Command-R with RAG

Command-R is a powerful LLM optimized for long context tasks such as retrieval augmented generation (RAG). Migrating a monolithic task such as question-answering or query-focused summarization to RAG can improve the quality of responses due to reduced hallucination and improved conciseness through grounding.

Previously, migrating an existing use case to RAG involved a lot of manual work around indexing documents, implementing at least a basic search strategy, extensive post-processing to introduce proper grounding through citations, and of course fine-tuning an LLM to work well in the RAG paradigm.

This cookbook demonstrates automatic migration of monolithic prompts through two diverse use cases where an original prompt is broken down into two parts: (1) context; and (2) instructions. The former can be done automatically or through simple chunking, while the latter is done automatically by Command-R through single shot prompt optimization.

The two use cases demonstrated here are:

  1. Autobiography Assistant; and
  2. Legal Question Answering
#!pip install cohere
import json
import os
import re

import cohere
import getpass
CO_API_KEY = getpass.getpass('cohere API key:')
cohere API key:··········
co = cohere.Client(CO_API_KEY)

Autobiography Assistant

This application scenario is a common LLM-as-assistant use case: given some context, help the user to complete a task. In this case, the task is to write a concise autobiographical summary.

original_prompt = '''## information
Current Job Title: Senior Software Engineer
Current Company Name: GlobalSolTech
Work Experience: Over 15 years of experience in software engineering, specializing in AI and machine learning. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.
Current Department Name: AI Research and Development
Education: B.Sc. in Physics from Trent University (2004), Ph.D. in Statistics from HEC in Paris (2010)
Hobbies: I love hiking in the mountains, free diving, and collecting and restoring vintage world war one mechanical watches.
Family: Married with 4 children and 3 grandchildren.

## instructions
Your task is to assist a user in writing a short biography for social media.
The length of the text should be no more than 100 words.
Write the summary in first person.'''
response = co.chat(
    message=original_prompt,
    model='command-r',
)
print(response.text)
I'm a Senior Software Engineer at GlobalSolTech, with over 15 years of experience in AI and machine learning. My expertise lies in developing innovative algorithms for natural language processing, computer vision, and recommendation systems. I hold a B.Sc. in Physics and a Ph.D. in Statistics and enjoy hiking, free diving, and collecting vintage watches in my spare time. I'm passionate about using my skills to contribute to cutting-edge AI research and development. At GlobalSolTech, I'm proud to be part of a dynamic team driving technological advancement.

Using Command-R, we can automatically upgrade the original prompt to a RAG-style prompt to get more faithful adherence to the instructions, a clearer and more concise prompt, and in-line citations for free. Consider the following meta-prompt:

meta_prompt = f'''Below is a task for an LLM delimited with ## Original Task. Your task is to split that task into two parts: (1) the context; and (2) the instructions.
The context should be split into several separate parts and returned as a JSON object where each part has a name describing its contents and the value is the contents itself.
Make sure to include all of the context contained in the original task description and do not change its meaning.
The instructions should be re-written so that they are very clear and concise. Do not change the meaning of the instructions or task, just make sure they are very direct and clear.
Return everything in a JSON object with the following structure:

{{
  "context": [{{"<description of part 1>": "<content of part 1>"}}, ...],
  "instructions": "<the re-written instructions>"
}}

## Original Task
{original_prompt}
'''
print(meta_prompt)
Below is a task for an LLM delimited with ## Original Task. Your task is to split that task into two parts: (1) the context; and (2) the instructions.
The context should be split into several separate parts and returned as a JSON object where each part has a name describing its contents and the value is the contents itself.
Make sure to include all of the context contained in the original task description and do not change its meaning.
The instructions should be re-written so that they are very clear and concise. Do not change the meaning of the instructions or task, just make sure they are very direct and clear.
Return everything in a JSON object with the following structure:

{
  "context": [{"<description of part 1>": "<content of part 1>"}, ...],
  "instructions": "<the re-written instructions>"
}

## Original Task
## information
Current Job Title: Senior Software Engineer
Current Company Name: GlobalSolTech
Work Experience: Over 15 years of experience in software engineering, specializing in AI and machine learning. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.
Current Department Name: AI Research and Development
Education: B.Sc. in Physics from Trent University (2004), Ph.D. in Statistics from HEC in Paris (2010)
Hobbies: I love hiking in the mountains, free diving, and collecting and restoring vintage world war one mechanical watches.
Family: Married with 4 children and 3 grandchildren.

## instructions
Your task is to assist a user in writing a short biography for social media.
The length of the text should be no more than 100 words.
Write the summary in first person.

Command-R returns with the following:

upgraded_prompt = co.chat(
    message=meta_prompt,
    model='command-r',
)
print(upgraded_prompt.text)
Here is the task delved into a JSON object as requested:
```json
{
  "context": [
    {
      "Work Experience": "Over 15 years of AI and machine learning engineering experience. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems."
    },
    {
      "Education": "B.Sc. in Physics (Trent University, 2004) and Ph.D. in Statistics (HEC Paris, 2010)."
    },
    {
      "Personal Life": "I’m a married senior software engineer with 4 children and 3 grandchildren. I enjoy hiking, free diving, and vintage watch restoration."
    },
    {
      "Current Position": "I work at GlobalSolTech in the AI Research and Development department as a senior software engineer."
    }
  ],
  "instructions": "Using the provided information, write a concise, first-person social media biography of no more than 100 words."
}
```

To extract the returned information, we will write two simple functions to post-process out the JSON and then parse it.

def get_json(text: str) -> str:
    matches = [m.group(1) for m in re.finditer("```([\w\W]*?)```", text)]
    if len(matches):
        postproced = matches[0]
        if postproced[:4] == 'json':
            return postproced[4:]
        return postproced
    return text
def get_prompt_and_docs(text: str) -> tuple:
    json_obj = json.loads(get_json(text))
    prompt = json_obj['instructions']
    docs = []
    for item in json_obj['context']:
        for k,v in item.items():
            docs.append({"title": k, "snippet": v})
    return prompt, docs
new_prompt, docs = get_prompt_and_docs(upgraded_prompt.text)
new_prompt, docs
('Using the provided information, write a concise, first-person social media biography of no more than 100 words.',
 [{'title': 'Work Experience',
   'snippet': 'Over 15 years of AI and machine learning engineering experience. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.'},
  {'title': 'Education',
   'snippet': 'B.Sc. in Physics (Trent University, 2004) and Ph.D. in Statistics (HEC Paris, 2010).'},
  {'title': 'Personal Life',
   'snippet': 'I’m a married senior software engineer with 4 children and 3 grandchildren. I enjoy hiking, free diving, and vintage watch restoration.'},
  {'title': 'Current Position',
   'snippet': 'I work at GlobalSolTech in the AI Research and Development department as a senior software engineer.'}])

As we can see above, the new prompt is much more concise and gets right to the point. The context has been split into 4 "documents" that Command-R can ground the information to. Now let's run the same task with the new prompt while leveraging the documents= parameter. Note that the docs variable is a list of dict objects with title describing the contents of a text and snippet containing the text itself:

response = co.chat(
    message=new_prompt,
    model='command-r',
    documents=docs,
)
print(response.text)
I'm a senior software engineer with a Ph.D. in Statistics and over 15 years of AI and machine learning engineering experience. My current focus at GlobalSolTech's AI R&D department is developing algorithms for natural language processing, computer vision, and recommendation systems. In my free time, I enjoy hiking, freediving, and restoring vintage watches, and I'm a married father of four with three grandchildren.

The response is concise. More importantly, we can ensure that there is no hallucination because the text is automatically grounded in the input documents. Using the simple function below, we can add this grounding information to the text as citations:

def insert_citations(text: str, citations: list[dict], add_one: bool=False):
    """
    A helper function to pretty print citations.
    """
    offset = 0
    # Process citations in the order they were provided
    for citation in citations:
        # Adjust start/end with offset
        start, end = citation.start + offset, citation.end + offset
        if add_one:
            cited_docs = [str(int(doc[4:]) + 1) for doc in citation.document_ids]
        else:
            cited_docs = [doc[4:] for doc in citation.document_ids]
        # Shorten citations if they're too long for convenience
        if len(cited_docs) > 3:
            placeholder = "[" + ", ".join(cited_docs[:3]) + "...]"
        else:
            placeholder = "[" + ", ".join(cited_docs) + "]"
        # ^ doc[4:] removes the 'doc_' prefix, and leaves the quoted document
        modification = f'{text[start:end]} {placeholder}'
        # Replace the cited text with its bolded version + placeholder
        text = text[:start] + modification + text[end:]
        # Update the offset for subsequent replacements
        offset += len(modification) - (end - start)

    return text
print(insert_citations(response.text, response.citations, True))
I'm a senior software engineer [3, 4] with a Ph.D. in Statistics [2] and over 15 years of AI and machine learning engineering experience. [1] My current focus at GlobalSolTech's AI R&D department [4] is developing algorithms for natural language processing, computer vision, and recommendation systems. [1] In my free time, I enjoy hiking, freediving, and restoring vintage watches [3], and I'm a married father of four with three grandchildren. [3]

Now let's move on to an arguably more difficult problem.

Legal Question Answering

On March 21st, the DOJ announced that it is suing Apple for anti-competitive practices. The complaint is 88 pages long and consists of about 230 paragraphs of text. To understand what the suit alleges, a common use case would be to ask for a summary. Because Command-R has a context window of 128K, even an 88-page legal complaint fits comfortably within the window.

apple = open('data/apple_mod.txt').read()
tokens = co.tokenize(text=apple, model='command-r')
len(tokens.tokens)
29697

We can set up a prompt template that allows us to ask questions on the original text.

prompt_template = '''
{legal_text}

{question}
'''
question = '''Please summarize the attached legal complaint succinctly. Focus on answering the question: what does the complaint allege?'''
rendered_prompt = prompt_template.format(legal_text=apple, question=question)
response = co.chat(
    message=rendered_prompt,
    model='command-r',
    temperature=0.3,
)
print(response.text)
The complaint alleges that Apple has violated antitrust laws by engaging in a pattern of anticompetitive conduct to maintain its monopoly power over the U.S. markets for smartphones and performance smartphones. Apple is accused of using its control over app distribution and access to its operating system to impede competition and innovation. Specifically, the company is said to have restricted developers' ability to create certain apps and limited the functionality of others, making it harder for consumers to switch away from iPhones to rival smartphones. This conduct is alleged to have harmed consumers and developers by reducing choice, increasing prices, and stifling innovation. The plaintiffs seek injunctive relief and potential monetary awards to remedy these illegal practices.

The summary seems clear enough. But we are interested in the specific allegations that the DOJ makes. For example, skimming the full complaint, it looks like the DOJ is alleging that Apple could encrypt text messages sent to Android phones if it wanted to do so. We can amend the rendered prompt and ask:

question = '''Does the DOJ allege that Apple could encrypt text messages sent to Android phones?'''
rendered_prompt = prompt_template.format(legal_text=apple, question=question)
response = co.chat(
    message=rendered_prompt,
    model='command-r',
)
print(response.text)
Yes, the DOJ alleges that Apple could allow iPhone users to send encrypted messages to Android users while still using iMessage on their iPhones but chooses not to do so. According to the DOJ, this would instantly improve the privacy and security of iPhones and other smartphones.

This is a very interesting allegation that at first glance suggests that the model could be hallucinating. Because RAG has been shown to help reduce hallucinations and grounds its responses in the input text, we should convert this prompt to the RAG style paradigm to gain confidence in its response.

While previously we asked Command-R to chunk the text for us, the legal complaint is highly structured with numbered paragraphs so we can use the following function to break the complaint into input docs ready for RAG:

def chunk_doc(input_doc: str) -> list:
    chunks = []
    current_para = 'Preamble'
    current_chunk = ''
    # pattern to find an integer number followed by a dot (finding the explicitly numbered paragraph numbers)
    pattern = r'^\d+\.$'

    for line in input_doc.splitlines():
        if re.match(pattern, line):
            chunks.append((current_para.replace('.', ''), current_chunk))
            current_chunk = ''
            current_para = line
        else:
            current_chunk += line + '\n'

    docs = []
    for chunk in chunks:
        docs.append({"title": chunk[0], "snippet": chunk[1]})

    return docs
chunks = chunk_doc(apple)
print(chunks[18])
{'title': '18', 'snippet': '\nProtecting competition and the innovation that competition inevitably ushers in\nfor consumers, developers, publishers, content creators, and device manufacturers is why\nPlaintiffs bring this lawsuit under Section 2 of the Sherman Act to challenge Apple’s\nmaintenance of its monopoly over smartphone markets, which affect hundreds of millions of\nAmericans every day. Plaintiffs bring this case to rid smartphone markets of Apple’s\nmonopolization and exclusionary conduct and to ensure that the next generation of innovators\ncan upend the technological world as we know it with new and transformative technologies.\n\n\nII.\n\nDefendant Apple\n\n'}

We can now try the same question but ask it directly to Command-R with the chunks as grounding information.

response = co.chat(
    message='''Does the DOJ allege that Apple could encrypt text messages sent to Android phones?''',
    model='command-r',
    documents=chunks,
)
print(response.text)
Yes, according to the DOJ, Apple could encrypt text messages sent from iPhones to Android phones. The DOJ claims that Apple degrades the security and privacy of its users by impeding cross-platform encryption and preventing developers from fixing the broken cross-platform messaging experience. Apple's conduct makes it harder to switch from iPhone to Android, as messages sent from iPhones to Android phones are unencrypted.

The responses seem similar, but we should add citations and check the citation to get confidence in the response.

print(insert_citations(response.text, response.citations))
Yes, according to the DOJ, Apple could encrypt text messages sent from iPhones to Android phones. [144] The DOJ claims that Apple degrades the security and privacy [144] of its users by impeding cross-platform encryption [144] and preventing developers from fixing the broken cross-platform messaging experience. [93] Apple's conduct makes it harder to switch from iPhone to Android [144], as messages sent from iPhones to Android phones are unencrypted. [144]

The most important passage seems to be paragraph 144. Paragraph 93 is also cited. Let's check what they contain.

print(chunks[144]['snippet'])
Apple is also willing to make the iPhone less secure and less private if that helps
maintain its monopoly power. For example, text messages sent from iPhones to Android phones
are unencrypted as a result of Apple’s conduct. If Apple wanted to, Apple could allow iPhone
users to send encrypted messages to Android users while still using iMessage on their iPhone,
which would instantly improve the privacy and security of iPhone and other smartphone users.
print(chunks[93]['snippet'])
Recently, Apple blocked a third-party developer from fixing the broken cross-
platform messaging experience in Apple Messages and providing end-to-end encryption for
messages between Apple Messages and Android users. By rejecting solutions that would allow
for cross-platform encryption, Apple continues to make iPhone users’ less secure than they could
otherwise be.

ii.

Paragraph 144 indeed contains the important allegation: If Apple wanted to, Apple could allow iPhone users to send encrypted messages to Android users.

In this cookbook we have shown how one can easily take an existing monolithic prompt and migrate it to the RAG paradigm to get less hallucination, grounded information, and in-line citations. We also demonstrated Command-R's ability to re-write an instruction prompt in a single shot to make it more concise and potentially lead to higher quality completions.