Brain Foundation Models

The Need for Brain Foundation Models

The conceptual definition of a "brain foundation model" stems from the development of large-scale EEG foundation models, also known as large EEG models (LEMs) like LaBraM, EEGPT and CBraMod, which are large models (largely based on the Transformer architecture) trained on vast EEG datasets including the TUH EEG Corpus. This allows for the model to learn generic representations of cognition.

The use of foundational models in this context allows the learning of rich signal representations from unlabelled recordings, which are far more abundant than labelled cognitive-state data in the psychophysiological domain. This solves the labelling bottleneck associated with collecting such data:

Participants must complete long experiments with carefully controlled cognitive stimuli.
Ground-truth cognitive state labels require validated subjective questionnaires (NASA-TLX, SART) administered repeatedly throughout the session, interrupting the natural flow.
EEG recordings require skilled setup, electrode gel application, and post-hoc quality checking.
A typical well-controlled study yields data from 20–50 participants - orders of magnitude fewer than the millions of samples used to train language or vision foundation models.

In contrast, unlabelled EEG recordings are increasingly available from clinical archives, sleep labs, neuroscience research repositories, and consumer-grade wearable deployments. These recordings contain rich signal structure but carry no cognitive state annotation.

However, we acknowledge that just EEG data is not good enough to have a core understanding of the person's "cognition" level, with EEG largely characterized by a low signal-to-noise ratio (SNR), apparent stochasticity, nonstationarity and nonlinearity.

Hence, we wish to adapt this framework across multiple modalities including PPG, eye tracking and fatigue.

Design Principles

We aim to create brain foundation models that follow the following key design principles:

Principle	Definition
Task-Agnostic	Generalized across varying tasks that engage different parts of the brain (e.g. memory, spatial visualization etc.)
Subject-Agnostic	Generally resistant to inter-subject variability, but also capable of personalised adaptation (e.g. PULSE, TERSE or PhysioPFM)
Hardware-Agnostic	Generalized across a wide array of devices, manufacturers and sensor generations with different sampling rates, scales and external artifacts (e.g. EEG-X)
Channel Topology-Agnostic	Maps a variable number of channel data to a latent space representation for full channel permutation equivariance (e.g. DIVER-0 or LUNA)
Sequence Length-Agnostic	Supports arbitrary-length recording durations, where the temporal relationship between a signal window and a cognitive event is uncertain
Privacy-Preserving	Resistant to memorizing subject-specific biometric identity information, ensuring that such information cannot be extracted from model weights or activations
Modality-Agnostic	Can be applied to other psychophysiological data from sensors such as BVP (blood volume pulse), GSR (galvanic skin resistance) and more
Multi-Modal Fusion	Capable of unified signal processing alignment across individual modalities in order to identify modality-invariant and modality-specific characteristics (e.g. MISA, PhysioOmni and Brant-X)
Asymmetric Cross-Modal Representations	Capable of transfer of information across representations, enabling high-quality encoders to be built for underrepresented modalities (where there is not enough corpus data), supported by rich supervisory signals such as EEG

Our models aim to learn general-purpose representations of brain and body signals from large unlabelled corpora via self-supervised learning, which can then be specialized to downstream tasks such as cognitive state estimation with minimal labelled data. Rather than training a separate model for each sensor type, task, or population, a single pre-trained encoder forms the backbone for all downstream applications.

Architectural Implementation

Taking heavy inspiration from prior work done by torch-brain and braindecode, this library implements some of the core architectural components of models such as those mentioned above.

A large set of these models follow the essential Masked Autoencoder (MAE) architecture popularized by Meta, but we also implement some alternative methods such as Spiking Neural Networks (SNNs), State Space Models (SSMs) and more which can be utilized as replacements to improve performance in such models.

Key Research Areas

Modelling Paradigms: CNN, RNN, masked transformer, and neuromorphic architectures for physiological time series
Sequence Length Agnosticism: Supporting arbitrary recording lengths with time-frequency representations
Sensor Harmonization: Device- and topology-agnostic encoding across EEG hardware generations
Multimodal Learning: Cross-modal alignment between EEG and underrepresented modalities
Inter-Subject Variability: Differential privacy during pre-training; personalised adaptation at deployment
Explainability: Modified LASTS framework for surrogate and counterfactual explanations

Physiological Modalities

Modality	Signal	Primary Cognitive Relevance
EEG	Electrical cortical activity	Workload, attention, emotion, fatigue
ECG	Cardiac electrical activity	Stress, autonomic arousal
PPG / BVP	Peripheral blood volume pulse	Heart rate variability → stress and load
Eye Gaze & Pupillometry	Gaze position, pupil diameter	Workload, situational awareness
Speech	Acoustic para-linguistic features	Stress, arousal, cognitive load