COLET Dataset

COLET is a cognitive load estimation dataset based primarily on eye-tracking and pupillometry, making it one of the few datasets specifically designed to train and evaluate eye-based cognitive monitoring models.

Overview

Participants completed a visual search and browsing task across 21 images of varying cognitive demand. Eye movements and pupil diameter were recorded continuously throughout, providing a dense, high-temporal-resolution record of gaze behaviour and pupillary response under varying levels of visual cognitive load.

With 47 participants, COLET provides a substantial sample for training eye-tracking-based models, particularly valuable given the general scarcity of labelled eye-tracking cognitive datasets.

Cognitive Task

Participants viewed a series of images spanning a range of complexity levels - from simple, low-information images to complex, information-dense scenes - and performed cognitive load-inducing tasks on each. The 21 images were designed to systematically elicit a range of workload levels, from low (passive viewing of simple images) to high (active search and analysis of complex visual scenes).

This visual task design is relevant to operational monitoring scenarios where continuous visual search across dynamic, information-dense displays is required.

Data Format

Gaze Data

gaze/
├── subject_XX/
│   ├── trial_YY_gaze.csv
│   │   ├── timestamp (s)
│   │   ├── x (gaze position, normalised 0–1 horizontal)
│   │   ├── y (gaze position, normalised 0–1 vertical)
│   │   └── validity (fixation quality flag)

Pupillometry Data

pupil/
├── subject_XX/
│   ├── trial_YY_pupil.csv
│   │   ├── timestamp (s)
│   │   ├── LPD (left pupil diameter, mm)
│   │   └── RPD (right pupil diameter, mm)

Blink Data

blinks/
├── subject_XX/
│   └── trial_YY_blinks.csv
│       ├── onset (s)
│       ├── offset (s)
│       └── duration (ms)

Annotations

annotations.csv
├── subject_id
├── trial_id
├── image_id
└── workload_label  (high / low binary, or continuous scale)

Engineered Features

From the raw gaze and pupillometry streams, the following features are typically computed for each trial:

Fixation features: - Mean fixation duration, standard deviation of fixation duration - Fixation count (fixations per second) - Mean fixation dispersion

Saccade features: - Mean saccade amplitude, saccade velocity - Saccade frequency

Scan path features: - Scan path length (total gaze path distance) - Scan path entropy (irregularity measure) - Area of interest coverage (proportion of image regions fixated)

Pupillometry features: - Mean pupil diameter (LPD, RPD, mean) - Maximum task-evoked pupillary response (TEPR) - Pupil dilation velocity

Blink features: - Blink rate, mean blink duration

Relevance to Brain FM

COLET is the primary dataset for the Underrepresented Modalities research thread. Since eye tracking and pupillometry are low-resource modalities with no large pre-training corpora, COLET provides the fine-tuning and evaluation data for cross-modal alignment experiments.

The alignment strategy aligns eye-tracking encoders with the pre-trained EEG encoder using semantic-level pairing - COLET does not record EEG simultaneously, so alignment is done at the label level: high-workload eye-tracking windows are mapped to the region of the EEG latent space that contains high-workload EEG representations.

No simultaneous EEG

COLET records eye tracking only - no EEG is available. Cross-modal alignment must therefore use semantic-level (label-based) alignment rather than sample-level alignment. This is a harder alignment problem but is feasible using the label-conditioned alignment approach from Brant-X.