> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.cohere.com/llms.txt.
> For full documentation content, see https://docs.cohere.com/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.cohere.com/_mcp/server.

# Migrating Monolithic Prompts to Command A with RAG

> This page contains a discussion of how to automatically migrating monolothic prompts.

Command A is a powerful LLM optimized for long context tasks such as retrieval augmented generation (RAG). Migrating a monolithic task such as question-answering or query-focused summarization to RAG can improve the quality of responses due to reduced hallucination and improved conciseness through grounding.

Previously, migrating an existing use case to RAG involved a lot of manual work around indexing documents, implementing at least a basic search strategy, extensive post-processing to introduce proper grounding through citations, and of course fine-tuning an LLM to work well in the RAG paradigm.

This cookbook demonstrates automatic migration of monolithic prompts through two diverse use cases where an original prompt is broken down into two parts: (1) context; and (2) instructions. The former can be done automatically or through simple chunking, while the latter is done automatically by Command A through single shot prompt optimization.

The two use cases demonstrated here are:

1. Autobiography Assistant; and
2. Legal Question Answering

```python PYTHON
#!pip install cohere
```

```python PYTHON
import json
import os
import re

import cohere
import getpass
```

```python PYTHON
CO_API_KEY = getpass.getpass('cohere API key:')
```

```txt title="Output"
cohere API key:··········
```

```python PYTHON
co = cohere.Client(CO_API_KEY)
```

## Autobiography Assistant

This application scenario is a common LLM-as-assistant use case: given some context, help the user to complete a task. In this case, the task is to write a concise autobiographical summary.

```python PYTHON
original_prompt = '''## information
Current Job Title: Senior Software Engineer
Current Company Name: GlobalSolTech
Work Experience: Over 15 years of experience in software engineering, specializing in AI and machine learning. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.
Current Department Name: AI Research and Development
Education: B.Sc. in Physics from Trent University (2004), Ph.D. in Statistics from HEC in Paris (2010)
Hobbies: I love hiking in the mountains, free diving, and collecting and restoring vintage world war one mechanical watches.
Family: Married with 4 children and 3 grandchildren.

## instructions
Your task is to assist a user in writing a short biography for social media.
The length of the text should be no more than 100 words.
Write the summary in first person.'''
```

```python PYTHON
response = co.chat(
    message=original_prompt,
    model='command-a-03-2025',
)
```

```python PYTHON
print(response.text)
```

```txt
    I'm a Senior Software Engineer at GlobalSolTech, with over 15 years of experience in AI and machine learning. My expertise lies in developing innovative algorithms for natural language processing, computer vision, and recommendation systems. I hold a B.Sc. in Physics and a Ph.D. in Statistics and enjoy hiking, free diving, and collecting vintage watches in my spare time. I'm passionate about using my skills to contribute to cutting-edge AI research and development. At GlobalSolTech, I'm proud to be part of a dynamic team driving technological advancement.
```

Using Command A, we can automatically upgrade the original prompt to a RAG-style prompt to get more faithful adherence to the instructions, a clearer and more concise prompt, and in-line citations for free. Consider the following meta-prompt:

```python PYTHON
meta_prompt = f'''Below is a task for an LLM delimited with ## Original Task. Your task is to split that task into two parts: (1) the context; and (2) the instructions.
The context should be split into several separate parts and returned as a JSON object where each part has a name describing its contents and the value is the contents itself.
Make sure to include all of the context contained in the original task description and do not change its meaning.
The instructions should be re-written so that they are very clear and concise. Do not change the meaning of the instructions or task, just make sure they are very direct and clear.
Return everything in a JSON object with the following structure:

{{
  "context": [{{"<description 1="" of="" part="">": "<content 1="" of="" part="">"}}, ...],
  "instructions": "<the instructions="" re-written="">"
}}

## Original Task
{original_prompt}
'''
```

```python PYTHON
print(meta_prompt)
```

```txt title="Output"
Below is a task for an LLM delimited with ## Original Task. Your task is to split that task into two parts: (1) the context; and (2) the instructions.
The context should be split into several separate parts and returned as a JSON object where each part has a name describing its contents and the value is the contents itself.
Make sure to include all of the context contained in the original task description and do not change its meaning.
The instructions should be re-written so that they are very clear and concise. Do not change the meaning of the instructions or task, just make sure they are very direct and clear.
Return everything in a JSON object with the following structure:

{
    "context": [{"<description 1="" of="" part="">": "<content 1="" of="" part="">"}, ...],
    "instructions": "<the instructions="" re-written="">"
}

## Original Task
## information
Current Job Title: Senior Software Engineer
Current Company Name: GlobalSolTech
Work Experience: Over 15 years of experience in software engineering, specializing in AI and machine learning. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.
Current Department Name: AI Research and Development
Education: B.Sc. in Physics from Trent University (2004), Ph.D. in Statistics from HEC in Paris (2010)
Hobbies: I love hiking in the mountains, free diving, and collecting and restoring vintage world war one mechanical watches.
Family: Married with 4 children and 3 grandchildren.

## instructions
Your task is to assist a user in writing a short biography for social media.
The length of the text should be no more than 100 words.
Write the summary in first person.
```

Command A returns with the following:

```python PYTHON
upgraded_prompt = co.chat(
    message=meta_prompt,
    model='command-a-03-2025',
)
```

```python PYTHON
print(upgraded_prompt.text)
```

````txt title="Output"
Here is the task delved into a JSON object as requested:
```json JSON
{
    "context": [
    {
        "Work Experience": "Over 15 years of AI and machine learning engineering experience. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems."
    },
    {
        "Education": "B.Sc. in Physics (Trent University, 2004) and Ph.D. in Statistics (HEC Paris, 2010)."
    },
    {
        "Personal Life": "I’m a married senior software engineer with 4 children and 3 grandchildren. I enjoy hiking, free diving, and vintage watch restoration."
    },
    {
        "Current Position": "I work at GlobalSolTech in the AI Research and Development department as a senior software engineer."
    }
    ],
    "instructions": "Using the provided information, write a concise, first-person social media biography of no more than 100 words."
}
```
````

To extract the returned information, we will write two simple functions to post-process out the JSON and then parse it.

````python PYTHON
def get_json(text: str) -> str:
    matches = [m.group(1) for m in re.finditer("```([\w\W]*?)```", text)]
    if len(matches):
        postproced = matches[0]
        if postproced[:4] == 'json':
            return postproced[4:]
        return postproced
    return text
````

```python PYTHON
def get_prompt_and_docs(text: str) -> tuple:
    json_obj = json.loads(get_json(text))
    prompt = json_obj['instructions']
    docs = []
    for item in json_obj['context']:
        for k,v in item.items():
            docs.append({"title": k, "snippet": v})
    return prompt, docs
```

```python PYTHON
new_prompt, docs = get_prompt_and_docs(upgraded_prompt.text)
```

```python PYTHON
new_prompt, docs
```

```txt title="Output"
('Using the provided information, write a concise, first-person social media biography of no more than 100 words.',
    [{'title': 'Work Experience',
    'snippet': 'Over 15 years of AI and machine learning engineering experience. Proficient in Python, C++, and Java, with expertise in developing algorithms for natural language processing, computer vision, and recommendation systems.'},
    {'title': 'Education',
    'snippet': 'B.Sc. in Physics (Trent University, 2004) and Ph.D. in Statistics (HEC Paris, 2010).'},
    {'title': 'Personal Life',
    'snippet': 'I’m a married senior software engineer with 4 children and 3 grandchildren. I enjoy hiking, free diving, and vintage watch restoration.'},
    {'title': 'Current Position',
    'snippet': 'I work at GlobalSolTech in the AI Research and Development department as a senior software engineer.'}])
```

As we can see above, the new prompt is much more concise and gets right to the point. The context has been split into 4 "documents" that Command A can ground the information to. Now let's run the same task with the new prompt while leveraging the `documents=` parameter. Note that the `docs` variable is a list of dict objects with `title` describing the contents of a text and `snippet` containing the text itself:

```python PYTHON
response = co.chat(
    message=new_prompt,
    model='command-a-03-2025',
    documents=docs,
)
```

```python PYTHON
print(response.text)
```

```txt title="Output"
I'm a senior software engineer with a Ph.D. in Statistics and over 15 years of AI and machine learning engineering experience. My current focus at GlobalSolTech's AI R&amp;D department is developing algorithms for natural language processing, computer vision, and recommendation systems. In my free time, I enjoy hiking, freediving, and restoring vintage watches, and I'm a married father of four with three grandchildren.
```

The response is concise. More importantly, we can ensure that there is no hallucination because the text is automatically grounded in the input documents. Using the simple function below, we can add this grounding information to the text as citations:

```python PYTHON
def insert_citations(text: str, citations: list[dict], add_one: bool=False):
    """
    A helper function to pretty print citations.
    """
    offset = 0
    # Process citations in the order they were provided
    for citation in citations:
        # Adjust start/end with offset
        start, end = citation.start + offset, citation.end + offset
        if add_one:
            cited_docs = [str(int(doc[4:]) + 1) for doc in citation.document_ids]
        else:
            cited_docs = [doc[4:] for doc in citation.document_ids]
        # Shorten citations if they're too long for convenience
        if len(cited_docs) > 3:
            placeholder = "[" + ", ".join(cited_docs[:3]) + "...]"
        else:
            placeholder = "[" + ", ".join(cited_docs) + "]"
        # ^ doc[4:] removes the 'doc_' prefix, and leaves the quoted document
        modification = f'{text[start:end]} {placeholder}'
        # Replace the cited text with its bolded version + placeholder
        text = text[:start] + modification + text[end:]
        # Update the offset for subsequent replacements
        offset += len(modification) - (end - start)

    return text
```

```python PYTHON
print(insert_citations(response.text, response.citations, True))
```

```txt title="Output"
I'm a senior software engineer [3, 4] with a Ph.D. in Statistics [2] and over 15 years of AI and machine learning engineering experience. [1] My current focus at GlobalSolTech's AI R&amp;D department [4] is developing algorithms for natural language processing, computer vision, and recommendation systems. [1] In my free time, I enjoy hiking, freediving, and restoring vintage watches [3], and I'm a married father of four with three grandchildren. [3]
```

Now let's move on to an arguably more difficult problem.

## Legal Question Answering

On March 21st, the DOJ announced that it is [suing Apple](https://www.theverge.com/2024/3/21/24107659/apple-doj-lawsuit-antitrust-documents-suing) for anti-competitive practices. The [complaint](https://www.justice.gov/opa/media/1344546/dl) is 88 pages long and consists of about 230 paragraphs of text. To understand what the suit alleges, a common use case would be to ask for a summary. Because Command A has a context window of 256K, even an 88-page legal complaint fits comfortably within the window.

```python PYTHON
apple = open('data/apple_mod.txt').read()
```

```python PYTHON
tokens = co.tokenize(text=apple, model='command-a-03-2025')
len(tokens.tokens)
```

```txt title="Output"
29697
```

We can set up a prompt template that allows us to ask questions on the original text.

```python PYTHON
prompt_template = '''
{legal_text}

{question}
'''
```

```python PYTHON
question = '''Please summarize the attached legal complaint succinctly. Focus on answering the question: what does the complaint allege?'''
rendered_prompt = prompt_template.format(legal_text=apple, question=question)
```

```python PYTHON
response = co.chat(
    message=rendered_prompt,
    model='command-a-03-2025',
    temperature=0.3,
)
```

```python PYTHON
print(response.text)
```

```txt title="Output"
The complaint alleges that Apple has violated antitrust laws by engaging in a pattern of anticompetitive conduct to maintain its monopoly power over the U.S. markets for smartphones and performance smartphones. Apple is accused of using its control over app distribution and access to its operating system to impede competition and innovation. Specifically, the company is said to have restricted developers' ability to create certain apps and limited the functionality of others, making it harder for consumers to switch away from iPhones to rival smartphones. This conduct is alleged to have harmed consumers and developers by reducing choice, increasing prices, and stifling innovation. The plaintiffs seek injunctive relief and potential monetary awards to remedy these illegal practices.
```

The summary seems clear enough. But we are interested in the specific allegations that the DOJ makes. For example, skimming the full complaint, it looks like the DOJ is alleging that Apple could encrypt text messages sent to Android phones if it wanted to do so. We can amend the rendered prompt and ask:

```python PYTHON
question = '''Does the DOJ allege that Apple could encrypt text messages sent to Android phones?'''
rendered_prompt = prompt_template.format(legal_text=apple, question=question)
```

```python PYTHON
response = co.chat(
    message=rendered_prompt,
    model='command-a-03-2025',
)
```

```python PYTHON
print(response.text)
```

```txt title="Output"
Yes, the DOJ alleges that Apple could allow iPhone users to send encrypted messages to Android users while still using iMessage on their iPhones but chooses not to do so. According to the DOJ, this would instantly improve the privacy and security of iPhones and other smartphones.
```

This is a very interesting allegation that at first glance suggests that the model could be hallucinating. Because RAG has been shown to help reduce hallucinations and grounds its responses in the input text, we should convert this prompt to the RAG style paradigm to gain confidence in its response.

While previously we asked Command A to chunk the text for us, the legal complaint is highly structured with numbered paragraphs so we can use the following function to break the complaint into input docs ready for RAG:

```python PYTHON
def chunk_doc(input_doc: str) -> list:
    chunks = []
    current_para = 'Preamble'
    current_chunk = ''
    # pattern to find an integer number followed by a dot (finding the explicitly numbered paragraph numbers)
    pattern = r'^\d+\.$'

    for line in input_doc.splitlines():
        if re.match(pattern, line):
            chunks.append((current_para.replace('.', ''), current_chunk))
            current_chunk = ''
            current_para = line
        else:
            current_chunk += line + '\n'

    docs = []
    for chunk in chunks:
        docs.append({"title": chunk[0], "snippet": chunk[1]})

    return docs
```

```python PYTHON
chunks = chunk_doc(apple)
```

```python PYTHON
print(chunks[18])
```

```python title="Output"
    {'title': '18', 'snippet': '\nProtecting competition and the innovation that competition inevitably ushers in\nfor consumers, developers, publishers, content creators, and device manufacturers is why\nPlaintiffs bring this lawsuit under Section 2 of the Sherman Act to challenge Apple’s\nmaintenance of its monopoly over smartphone markets, which affect hundreds of millions of\nAmericans every day. Plaintiffs bring this case to rid smartphone markets of Apple’s\nmonopolization and exclusionary conduct and to ensure that the next generation of innovators\ncan upend the technological world as we know it with new and transformative technologies.\n\n\nII.\n\nDefendant Apple\n\n'}
```

We can now try the same question but ask it directly to Command A with the chunks as grounding information.

```python PYTHON
response = co.chat(
    message='''Does the DOJ allege that Apple could encrypt text messages sent to Android phones?''',
    model='command-a-03-2025',
    documents=chunks,
)
```

```python PYTHON
print(response.text)
```

```txt title="Output"
Yes, according to the DOJ, Apple could encrypt text messages sent from iPhones to Android phones. The DOJ claims that Apple degrades the security and privacy of its users by impeding cross-platform encryption and preventing developers from fixing the broken cross-platform messaging experience. Apple's conduct makes it harder to switch from iPhone to Android, as messages sent from iPhones to Android phones are unencrypted.
```

The responses seem similar, but we should add citations and check the citation to get confidence in the response.

```python PYTHON
print(insert_citations(response.text, response.citations))
```

```txt title="Output"
Yes, according to the DOJ, Apple could encrypt text messages sent from iPhones to Android phones. [144] The DOJ claims that Apple degrades the security and privacy [144] of its users by impeding cross-platform encryption [144] and preventing developers from fixing the broken cross-platform messaging experience. [93] Apple's conduct makes it harder to switch from iPhone to Android [144], as messages sent from iPhones to Android phones are unencrypted. [144]
```

The most important passage seems to be paragraph 144. Paragraph 93 is also cited. Let's check what they contain.

```python PYTHON
print(chunks[144]['snippet'])
```

```txt title="Output"
Apple is also willing to make the iPhone less secure and less private if that helps
maintain its monopoly power. For example, text messages sent from iPhones to Android phones
are unencrypted as a result of Apple’s conduct. If Apple wanted to, Apple could allow iPhone
users to send encrypted messages to Android users while still using iMessage on their iPhone,
which would instantly improve the privacy and security of iPhone and other smartphone users.
```

```python PYTHON
print(chunks[93]['snippet'])
```

```txt title="Output"
Recently, Apple blocked a third-party developer from fixing the broken cross-
platform messaging experience in Apple Messages and providing end-to-end encryption for
messages between Apple Messages and Android users. By rejecting solutions that would allow
for cross-platform encryption, Apple continues to make iPhone users’ less secure than they could
otherwise be.

ii.
```

Paragraph 144 indeed contains the important allegation: **If Apple wanted to, Apple could allow iPhone users to send encrypted messages to Android users**.

In this cookbook we have shown how one can easily take an existing monolithic prompt and migrate it to the RAG paradigm to get less hallucination, grounded information, and in-line citations. We also demonstrated Command A's ability to re-write an instruction prompt in a single shot to make it more concise and potentially lead to higher quality completions.