Underrepresented Modalities
Brain Foundation Models are designed to be general and modality-agnostic encoders for physiological signals. However, different modalities are dramatically unequal in terms of available pre-training data:
- EEG: tens of thousands of hours across public repositories (TUH EEG Corpus alone exceeds 30,000 hours)
- ECG: moderately available via clinical archives
- PPG / BVP: small public corpora
- Eye gaze / Pupillometry: very limited
- EMG: very limited for cognitive contexts
This asymmetry means that while a strong EEG encoder can be trained from scratch via self-supervised learning, modalities like PPG and eye gaze cannot benefit from the same approach without cross-modal knowledge transfer.
The Goal
Transfer knowledge from the large, well-trained EEG encoder to encoders for underrepresented modalities, so that these encoders gain representation quality equivalent to what they would achieve with far more data. This allows the system to use less invasive sensors - a wristband PPG monitor rather than an EEG headset - wherever sufficient performance can be achieved.
Why Less Invasive Modalities Matter
From an operational deployment perspective:
- EEG requires a headset - uncomfortable for long shifts, requires preparation time, and is visually conspicuous. It may not be acceptable to operators for continuous monitoring.
- PPG requires only a wristband - unobtrusive, comfortable for all-day wear, and already standard in consumer fitness trackers.
- Eye tracking is camera-based - requires no body-worn sensor at all.
If a PPG or eye-tracking encoder can achieve acceptable cognitive state estimation quality through cross-modal alignment with EEG, it enables deployment scenarios that would be impractical with EEG.
Approach: Cross-Modality Alignment
The approach follows Brant-X's two-level alignment framework:
- Pre-train a large EEG encoder on available large-scale EEG corpora using masked reconstruction.
- Collect paired data where both EEG and the target modality (e.g. PPG) are recorded simultaneously, along with cognitive state labels.
- Train a modality-specific encoder for the underrepresented modality using the paired data, with an alignment loss that encourages its latent representations to match the EEG encoder's representations on the same signal windows.
- Fine-tune the aligned encoder on the downstream cognitive state prediction task.
The aligned encoder benefits from the EEG encoder's rich representations learned from large-scale data, even though it was only trained on small paired datasets.
Relevant Datasets
For PPG Alignment
- WAUC: 48 participants with simultaneous EEG (8 ch) + Empatica E4 BVP/PPG (64 Hz). The largest simultaneous EEG+PPG cognitive dataset available.
- UNIVERSE: 24 participants with simultaneous Muse S EEG + Empatica-compatible PPG.
For Eye-Tracking Alignment
- COLET: 47 participants with eye gaze and pupillometry during a visual search task. No simultaneous EEG, so alignment requires a semantic-level (label-based) approach rather than sample-level.
Modality-Specific Encoding Challenges
PPG
PPG signals have a fundamentally different character from EEG: they are quasi-periodic (cardiac rhythm), low-frequency (0.5–4 Hz), and dominated by the heart rate signal. Cognitive information is encoded in the subtle variations of the beat-to-beat interval (HRV features) rather than in the waveform shape per se.
An effective PPG encoder for cognitive state monitoring must:
- Extract HRV features from beat-to-beat intervals.
- Separate cardiac rhythm components from motion artefact.
- Capture spectral HRV features (LF/HF ratio) that reflect sympathetic/parasympathetic balance.
Eye Gaze and Pupillometry
Eye-tracking data encodes cognitive state differently from EEG or PPG:
- Gaze position \((x, y)\) reflects attentional allocation across the visual scene.
- Pupil diameter reflects LC-NE arousal and cognitive effort.
- Fixation and saccade patterns carry qualitatively different information from raw time-series amplitude.
An effective eye-tracking encoder must handle both the spatial structure of gaze data and the temporal dynamics of pupil dilation. Cross-modal alignment with EEG must bridge this qualitative difference.
Brant-X as the Reference Implementation
Zhang et al. (SIGKDD 2024) - arXiv:2409.00122
Brant-X is the primary reference for the cross-modality alignment strategy in this project. It demonstrates successful knowledge transfer from EEG to ECG, EMG, and eye movement signals, showing that two-level semantic alignment is an effective framework for bootstrapping underrepresented modality encoders.