For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
Guides and conceptsAPI ReferenceRelease NotesLLMUCookbooks
  • Cohere API
    • About
    • Teams and Roles
    • Errors
  • Endpoints
      • POSTCreate a Dataset
      • GETList Datasets
      • GETGet Dataset Usage
      • GETGet a Dataset
      • DELDelete a Dataset
  • Deprecated
  • Audio
  • Batches
    • GETList batches
    • POSTCreate a batch
    • GETRetrieve a batch
    • POSTCancel a batch
LogoLogodocs
DASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN
Endpointsv1/datasets

Create a Dataset

POST
https://api.cohere.com/v1/datasets
POST
/v1/datasets
1import cohere
2
3co = cohere.Client()
4
5# upload a dataset
6my_dataset = co.datasets.create(
7 name="embed-dataset",
8 data=open("./embed.jsonl", "rb"),
9 type="embed-input",
10)
11
12# wait for validation to complete
13response = co.wait(my_dataset)
14
15print(response)
1{
2 "id": "d7f8a9c2-3b4e-4f1a-9c2d-8e7f6a5b4c3d"
3}
Create a dataset by uploading a file. See ['Dataset Creation'](https://docs.cohere.com/docs/datasets#dataset-creation) for more information.
Was this page helpful?
Previous

List Datasets

Next
Built with

Create a dataset by uploading a file. See ‘Dataset Creation’ for more information.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Headers

X-Client-NamestringOptional
The name of the project that is making the request.

Query parameters

namestringRequired
The name of the uploaded dataset.
typeenumRequired

The dataset type, which is used to validate the data. The only valid type is embed-input used in conjunction with the Embed Jobs API.

keep_original_filebooleanOptional
Indicates if the original file should be stored.
skip_malformed_inputbooleanOptional

Indicates whether rows with malformed input should be dropped (instead of failing the validation check). Dropped rows will be returned in the warnings field.

keep_fieldslist of stringsOptional
List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the [schema for the corresponding Dataset type](https://docs.cohere.com/docs/datasets#dataset-types). For example, datasets of type `embed-input` will drop all fields other than the required `text` field. If any of the fields in `keep_fields` are missing from the uploaded file, Dataset validation will fail.
optional_fieldslist of stringsOptional
List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the [schema for the corresponding Dataset type](https://docs.cohere.com/docs/datasets#dataset-types). For example, Datasets of type `embed-input` will drop all fields other than the required `text` field. If any of the fields in `optional_fields` are missing from the uploaded file, Dataset validation will pass.
text_separatorstringOptional

Raw .txt uploads will be split into entries using the text_separator value.

csv_delimiterstringOptional
The delimiter used for .csv uploads.

Request

This endpoint expects a multipart form with multiple files.
datafileRequired
The file to upload
eval_datafileOptional
An optional evaluation file to upload

Response

A successful response.
idstring
The dataset ID

Errors

400
Bad Request Error
401
Unauthorized Error
403
Forbidden Error
404
Not Found Error
422
Unprocessable Entity Error
429
Too Many Requests Error
498
Invalid Token Error
499
Client Closed Request Error
500
Internal Server Error
501
Not Implemented Error
503
Service Unavailable Error
504
Gateway Timeout Error

List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the schema for the corresponding Dataset type. For example, datasets of type embed-input will drop all fields other than the required text field. If any of the fields in keep_fields are missing from the uploaded file, Dataset validation will fail.

List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the schema for the corresponding Dataset type. For example, Datasets of type embed-input will drop all fields other than the required text field. If any of the fields in optional_fields are missing from the uploaded file, Dataset validation will pass.