Create a Dataset

Create a dataset by uploading a file. See ‘Dataset Creation’ for more information.

Query parameters

namestringRequired

The name of the uploaded dataset.

typeenumRequired

The dataset type, which is used to validate the data. Valid types are embed-input, reranker-finetune-input, single-label-classification-finetune-input, chat-finetune-input, and multi-label-classification-finetune-input.

keep_original_filebooleanOptional

Indicates if the original file should be stored.

skip_malformed_inputbooleanOptional

Indicates whether rows with malformed input should be dropped (instead of failing the validation check). Dropped rows will be returned in the warnings field.

keep_fieldsstringOptional

List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the schema for the corresponding Dataset type. For example, datasets of type embed-input will drop all fields other than the required text field. If any of the fields in keep_fields are missing from the uploaded file, Dataset validation will fail.

optional_fieldsstringOptional

List of names of fields that will be persisted in the Dataset. By default the Dataset will retain only the required fields indicated in the schema for the corresponding Dataset type. For example, Datasets of type embed-input will drop all fields other than the required text field. If any of the fields in optional_fields are missing from the uploaded file, Dataset validation will pass.

text_separatorstringOptional

Raw .txt uploads will be split into entries using the text_separator value.

csv_delimiterstringOptional

The delimiter used for .csv uploads.

Request

This endpoint expects a multipart form with multiple files.

datafileRequired

The file to upload

eval_datafileOptional

An optional evaluation file to upload

Response

A successful response.

idstring or null

The dataset ID

1	import cohere
2
3	co = cohere.Client()
4
5	# upload a dataset
6	my_dataset = co.datasets.create(
7	name="chat-dataset",
8	data=open("./chat.jsonl", "rb"),
9	type="chat-finetune-input",
10	)
11
12	# wait for validation to complete
13	response = co.wait(my_dataset)
14
15	print(response)

Create a Dataset

Headers

Query parameters

Request

Response

Errors