Dataset Splits
Dataset splits define the subset of test-set on which the submissions will be evaluated on. Generally, most challenges have three splits:
train_split (Allow participants to make a large number of submissions, let them see how they are doing, and let them overfit)
test_split (Allow a small number of submissions so that they cannot mimic test_set. Use this split to decide the winners for the challenge)
val_split (Allow participants to make submissions and evaluate on the validation split)
A dataset split has the following subfields:
id(required)Type:
integerDescription: Unique numeric identifier for the dataset split. Used internally to reference this split in phase-split mappings.
Example:
id: 1
name(required)Type:
stringConstraints: Must be unique.
Description: Human-readable name of the dataset split. This will be shown in the EvalAI UI and should clearly describe the split’s purpose.
Example:
name: Train Split
codename(required)Type:
stringConstraints: Must be unique and must match the codename used in the evaluation script.
Description: A unique identifier used to map evaluation results to the correct dataset split. This is critical for EvalAI to interpret the scores returned by your evaluation script.
Example:
codename: train_split
Example
Here’s how the dataset splits configuration will look like in
challenge_config.yamlfile of a sample challenge:dataset_splits: - id: 1 name: Train Split codename: train_split - id: 2 name: Test Split codename: test_split - id: 3 name: Validation Split codename: val_split