Generate

<Warning> This API is marked as "Legacy" and is no longer maintained. Follow the [migration guide](https://docs.cohere.com/docs/migrating-from-cogenerate-to-cochat) to start using the Chat API. </Warning> Generates realistic text conditioned on a given input.

This API is marked as “Legacy” and is no longer maintained. Follow the migration guide to start using the Chat API.

Generates realistic text conditioned on a given input.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

promptstringRequired

The input text that serves as the starting point for generating the response. Note: The prompt will be pre-processed and modified before reaching the model.

streamfalseRequired

When `true`, the response will be a JSON stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated. The final event will contain the complete response, and will contain an `is_finished` field set to `true`. The event will also contain a `finish_reason`, which can be one of the following: - `COMPLETE` - the model sent back a finished reply - `MAX_TOKENS` - the reply was cut off because the model reached the maximum number of tokens for its context length - `ERROR` - something went wrong when generating the reply - `ERROR_TOXIC` - the model generated a reply that was deemed toxic

modelstringOptional

The identifier of the model to generate with. Currently available models are `command` (default), `command-nightly` (experimental), `command-light`, and `command-light-nightly` (experimental). Smaller, "light" models are faster, while larger models will perform better. [Custom models](https://docs.cohere.com/docs/training-custom-models) can also be supplied with their full ID.

num_generationsintegerOptional

The maximum number of generations that will be returned. Defaults to 1, min value of 1, max value of 5.

max_tokensintegerOptional

The maximum number of tokens the model will generate as part of the response. Note: Setting a low value may result in incomplete generations. This parameter is off by default, and if it's not specified, the model will continue generating until it emits an EOS completion token. See [BPE Tokens](/bpe-tokens-wiki) for more details. Can only be set to `0` if `return_likelihoods` is set to `ALL` to get the likelihood of the prompt.

truncateenumOptionalDefaults to END

One of `NONE|START|END` to specify how the API will handle inputs longer than the maximum token length. Passing `START` will discard the start of the input. `END` will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If `NONE` is selected, when the input exceeds the maximum input token length an error will be returned.

Allowed values:

temperaturedoubleOptional

A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations. See Temperature for more details. Defaults to 0.75, min value of 0.0, max value of 5.0.

seedintegerOptional0-18446744073709552000

If specified, the backend will make a best effort to sample tokens deterministically, such that repeated requests with the same seed and parameters should return the same result. However, determinism cannot be totally guaranteed. Compatible Deployments: Cohere Platform, Azure, AWS Sagemaker/Bedrock, Private Deployments

presetstringOptional

Identifier of a custom preset. A preset is a combination of parameters, such as prompt, temperature etc. You can create presets in the [playground](https://dashboard.cohere.com/playground/generate). When a preset is specified, the `prompt` parameter becomes optional, and any included parameters will override the preset's parameters.

end_sequenceslist of stringsOptional

The generated text will be cut at the beginning of the earliest occurrence of an end sequence. The sequence will be excluded from the text.

stop_sequenceslist of stringsOptional

The generated text will be cut at the end of the earliest occurrence of a stop sequence. The sequence will be included the text.

kintegerOptional

Ensures only the top k most likely tokens are considered for generation at each step. Defaults to 0, min value of 0, max value of 500.

pdoubleOptional

Ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. If both k and p are enabled, p acts after k. Defaults to 0.75. min value of 0.01, max value of 0.99.

frequency_penaltydoubleOptional

Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.

Using frequency_penalty in combination with presence_penalty is not supported on newer models.

presence_penaltydoubleOptional

Defaults to `0.0`, min value of `0.0`, max value of `1.0`. Can be used to reduce repetitiveness of generated tokens. Similar to `frequency_penalty`, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies. Using `frequency_penalty` in combination with `presence_penalty` is not supported on newer models.

return_likelihoodsenumOptionalDefaults to NONE

One of GENERATION|NONE to specify how and if the token likelihoods are returned with the response. Defaults to NONE.

If GENERATION is selected, the token likelihoods will only be provided for generated text.

WARNING: ALL is deprecated, and will be removed in a future release.

Allowed values:

raw_promptingbooleanOptional

When enabled, the user’s prompt will be sent to the model without any pre-processing.

Response headers

X-API-Warningstring

The name of the project that is making the request.

Response

idstring

generationslist of objects

List of generated results

promptstring

Prompt used for generations.

metaobject

Errors

400

Bad Request Error

401

Unauthorized Error

403

Forbidden Error

404

Not Found Error

422

Unprocessable Entity Error

429

Too Many Requests Error

498

Invalid Token Error

499

Client Closed Request Error

500

Internal Server Error

501

Not Implemented Error

503

Service Unavailable Error

504

Gateway Timeout Error

This API is marked as “Legacy” and is no longer maintained. Follow the migration guide to start using the Chat API.

Generates realistic text conditioned on a given input.

Bearer authentication of the form Bearer <token>, where token is your auth token.

The name of the project that is making the request.

The input text that serves as the starting point for generating the response. Note: The prompt will be pre-processed and modified before reaching the model.

When true, the response will be a JSON stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.

The final event will contain the complete response, and will contain an is_finished field set to true. The event will also contain a finish_reason, which can be one of the following:

COMPLETE - the model sent back a finished reply
MAX_TOKENS - the reply was cut off because the model reached the maximum number of tokens for its context length
ERROR - something went wrong when generating the reply
ERROR_TOXIC - the model generated a reply that was deemed toxic

The identifier of the model to generate with. Currently available models are command (default), command-nightly (experimental), command-light, and command-light-nightly (experimental). Smaller, “light” models are faster, while larger models will perform better. Custom models can also be supplied with their full ID.

The maximum number of generations that will be returned. Defaults to 1, min value of 1, max value of 5.

The maximum number of tokens the model will generate as part of the response. Note: Setting a low value may result in incomplete generations.

This parameter is off by default, and if it’s not specified, the model will continue generating until it emits an EOS completion token. See BPE Tokens for more details.

Can only be set to 0 if return_likelihoods is set to ALL to get the likelihood of the prompt.

One of NONE|START|END to specify how the API will handle inputs longer than the maximum token length.

Passing START will discard the start of the input. END will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model.

If NONE is selected, when the input exceeds the maximum input token length an error will be returned.

Identifier of a custom preset. A preset is a combination of parameters, such as prompt, temperature etc. You can create presets in the playground. When a preset is specified, the prompt parameter becomes optional, and any included parameters will override the preset’s parameters.

The generated text will be cut at the beginning of the earliest occurrence of an end sequence. The sequence will be excluded from the text.

The generated text will be cut at the end of the earliest occurrence of a stop sequence. The sequence will be included the text.

Ensures only the top k most likely tokens are considered for generation at each step. Defaults to 0, min value of 0, max value of 500.

Using frequency_penalty in combination with presence_penalty is not supported on newer models.

Defaults to 0.0, min value of 0.0, max value of 1.0.

Can be used to reduce repetitiveness of generated tokens. Similar to frequency_penalty, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.

Using frequency_penalty in combination with presence_penalty is not supported on newer models.

One of GENERATION|NONE to specify how and if the token likelihoods are returned with the response. Defaults to NONE.

If GENERATION is selected, the token likelihoods will only be provided for generated text.

WARNING: ALL is deprecated, and will be removed in a future release.

When enabled, the user’s prompt will be sent to the model without any pre-processing.

The name of the project that is making the request.

List of generated results

Prompt used for generations.

When true, the response will be a JSON stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.

The final event will contain the complete response, and will contain an is_finished field set to true. The event will also contain a finish_reason, which can be one of the following:

COMPLETE - the model sent back a finished reply
MAX_TOKENS - the reply was cut off because the model reached the maximum number of tokens for its context length
ERROR - something went wrong when generating the reply
ERROR_TOXIC - the model generated a reply that was deemed toxic

The maximum number of tokens the model will generate as part of the response. Note: Setting a low value may result in incomplete generations.

This parameter is off by default, and if it’s not specified, the model will continue generating until it emits an EOS completion token. See BPE Tokens for more details.

Can only be set to 0 if return_likelihoods is set to ALL to get the likelihood of the prompt.

One of NONE|START|END to specify how the API will handle inputs longer than the maximum token length.

If NONE is selected, when the input exceeds the maximum input token length an error will be returned.

Defaults to 0.0, min value of 0.0, max value of 1.0.

Using frequency_penalty in combination with presence_penalty is not supported on newer models.

1	import cohere
2
3	co = cohere.Client()
4
5	response = co.generate(
6	prompt="Please explain to me how LLMs work",
7	)
8	print(response)

1	{
2	"id": "6afae9c2-3375-4d0e-8d18-2e9eb7f2c3ec",
3	"generations": [
4	{
5	"id": "8e6de35d-3007-43ab-9253-ac4f95dcb8a2",
6	"text": "LLMs, or Large Language Models, are a type of neural network-based AI model that has been trained on massive amounts of text data and have become ubiquitous in the AI landscape. They possess astounding capabilities for comprehending and generating human-like language.\nThese models leverage neural networks that operate on a large scale, often involving millions or even billions of parameters. This substantial scale enables them to capture intricate patterns and connections within the vast amounts of text they have been trained on.\n\nThe training process for LLMs is fueled by colossal datasets of textual information, ranging from books and articles to websites and conversational transcripts. This extensive training enables them to develop a nuanced understanding of language patterns, grammar, and semantics.\n\nWhen posed with a new text input, LLMs employ their finely honed understanding of language to generate informed responses or undertake tasks such as language translation, text completion, or question answering. They do this by manipulating the input text through adding, removing, or altering elements to craft a desired output.\n\nOne of the underlying principles of their efficacy is the recurrent neural network (RNN) architecture they often adopt. This design enables them to process sequential data like natural language effectively. RNNs possess \"memory\" aspects via loops between layers, which allows them to retain and manipulate information gathered across long sequences, akin to the way humans process information.\n\nHowever, it's their size that arguably constitutes their most notable aspect. The sheer volume of these models – with counts of parameters often exceeding 100 million – enables them to capture correlations and patterns within language data effectively. This empowers them to generate coherent and contextually appropriate responses, posing a remarkable advancement in conversational AI.\n\nWhile LLMs have demonstrated extraordinary language prowess, it's vital to acknowledge their limitations and potential for improvement. Their biases often reflect those of the training data, and they may struggle with logical inconsistencies or factual errors. Ongoing research aims to enhance their robustness, diversity, and overall usability.\n\nIn essence, LLMs are a groundbreaking manifestation of AI's potential to simulate and even extend human language capabilities, while also serving as a testament to the ongoing journey towards refining and perfecting these technologies."
7	}
8	],
9	"prompt": "Please explain to me how LLMs work",
10	"meta": {
11	"api_version": {
12	"version": "1"
13	},
14	"billed_units": {
15	"input_tokens": 8,
16	"output_tokens": 442
17	}
18	}
19	}

1	import cohere
2
3	co = cohere.Client()
4
5	response = co.generate(
6	prompt="Please explain to me how LLMs work",
7	)
8	print(response)

1	{
2	"id": "6afae9c2-3375-4d0e-8d18-2e9eb7f2c3ec",
3	"generations": [
4	{
5	"id": "8e6de35d-3007-43ab-9253-ac4f95dcb8a2",
6	"text": "LLMs, or Large Language Models, are a type of neural network-based AI model that has been trained on massive amounts of text data and have become ubiquitous in the AI landscape. They possess astounding capabilities for comprehending and generating human-like language.\nThese models leverage neural networks that operate on a large scale, often involving millions or even billions of parameters. This substantial scale enables them to capture intricate patterns and connections within the vast amounts of text they have been trained on.\n\nThe training process for LLMs is fueled by colossal datasets of textual information, ranging from books and articles to websites and conversational transcripts. This extensive training enables them to develop a nuanced understanding of language patterns, grammar, and semantics.\n\nWhen posed with a new text input, LLMs employ their finely honed understanding of language to generate informed responses or undertake tasks such as language translation, text completion, or question answering. They do this by manipulating the input text through adding, removing, or altering elements to craft a desired output.\n\nOne of the underlying principles of their efficacy is the recurrent neural network (RNN) architecture they often adopt. This design enables them to process sequential data like natural language effectively. RNNs possess \"memory\" aspects via loops between layers, which allows them to retain and manipulate information gathered across long sequences, akin to the way humans process information.\n\nHowever, it's their size that arguably constitutes their most notable aspect. The sheer volume of these models – with counts of parameters often exceeding 100 million – enables them to capture correlations and patterns within language data effectively. This empowers them to generate coherent and contextually appropriate responses, posing a remarkable advancement in conversational AI.\n\nWhile LLMs have demonstrated extraordinary language prowess, it's vital to acknowledge their limitations and potential for improvement. Their biases often reflect those of the training data, and they may struggle with logical inconsistencies or factual errors. Ongoing research aims to enhance their robustness, diversity, and overall usability.\n\nIn essence, LLMs are a groundbreaking manifestation of AI's potential to simulate and even extend human language capabilities, while also serving as a testament to the ongoing journey towards refining and perfecting these technologies."
7	}
8	],
9	"prompt": "Please explain to me how LLMs work",
10	"meta": {
11	"api_version": {
12	"version": "1"
13	},
14	"billed_units": {
15	"input_tokens": 8,
16	"output_tokens": 442
17	}
18	}
19	}

Generate

Authentication

Headers

Request

Response headers

Response

Errors

Authentication

Headers

Request

Response headers

Response

Errors