Skip to content

Brain Foundation Models

The Need for Brain Foundation Models

The conceptual definition of a "brain foundation model" stems from the development of large-scale EEG foundation models, also known as large EEG models (LEMs) like LaBraM, EEGPT and CBraMod, which are large models (largely based on the Transformer architecture) trained on vast EEG datasets including the TUH EEG Corpus. This allows for the model to learn generic representations of cognition.

The use of foundational models in this context allows the learning of rich signal representations from unlabelled recordings, which are far more abundant than labelled cognitive-state data in the psychophysiological domain. This solves the labelling bottleneck associated with collecting such data:

  • Participants must complete long experiments with carefully controlled cognitive stimuli.
  • Ground-truth cognitive state labels require validated subjective questionnaires (NASA-TLX, SART) administered repeatedly throughout the session, interrupting the natural flow.
  • EEG recordings require skilled setup, electrode gel application, and post-hoc quality checking.
  • A typical well-controlled study yields data from 20–50 participants - orders of magnitude fewer than the millions of samples used to train language or vision foundation models.

In contrast, unlabelled EEG recordings are increasingly available from clinical archives, sleep labs, neuroscience research repositories, and consumer-grade wearable deployments. These recordings contain rich signal structure but carry no cognitive state annotation.

However, we acknowledge that just EEG data is not good enough to have a core understanding of the person's "cognition" level, with EEG largely characterized by a low signal-to-noise ratio (SNR), apparent stochasticity, nonstationarity and nonlinearity.

Hence, we wish to adapt this framework across multiple modalities including PPG, eye tracking and fatigue.

Design Principles

We aim to create brain foundation models that follow the following key design principles:

Principle Definition
Task-Agnostic Generalized across varying tasks that engage different parts of the brain (e.g. memory, spatial visualization etc.)
Subject-Agnostic Generally resistant to inter-subject variability, but also capable of personalised adaptation (e.g. PULSE, TERSE or PhysioPFM)
Hardware-Agnostic Generalized across a wide array of devices, manufacturers and sensor generations with different sampling rates, scales and external artifacts (e.g. EEG-X)
Channel Topology-Agnostic Maps a variable number of channel data to a latent space representation for full channel permutation equivariance (e.g. DIVER-0 or LUNA)
Sequence Length-Agnostic Supports arbitrary-length recording durations, where the temporal relationship between a signal window and a cognitive event is uncertain
Privacy-Preserving Resistant to memorizing subject-specific biometric identity information, ensuring that such information cannot be extracted from model weights or activations
Modality-Agnostic Can be applied to other psychophysiological data from sensors such as BVP (blood volume pulse), GSR (galvanic skin resistance) and more
Multi-Modal Fusion Capable of unified signal processing alignment across individual modalities in order to identify modality-invariant and modality-specific characteristics (e.g. MISA, PhysioOmni and Brant-X)
Asymmetric Cross-Modal Representations Capable of transfer of information across representations, enabling high-quality encoders to be built for underrepresented modalities (where there is not enough corpus data), supported by rich supervisory signals such as EEG

Our models aim to learn general-purpose representations of brain and body signals from large unlabelled corpora via self-supervised learning, which can then be specialized to downstream tasks such as cognitive state estimation with minimal labelled data. Rather than training a separate model for each sensor type, task, or population, a single pre-trained encoder forms the backbone for all downstream applications.

Architectural Implementation

Taking heavy inspiration from prior work done by torch-brain and braindecode, this library implements some of the core architectural components of models such as those mentioned above.

A large set of these models follow the essential Masked Autoencoder (MAE) architecture popularized by Meta, but we also implement some alternative methods such as Spiking Neural Networks (SNNs), State Space Models (SSMs) and more which can be utilized as replacements to improve performance in such models.

Key Research Areas

Physiological Modalities

Modality Signal Primary Cognitive Relevance
EEG Electrical cortical activity Workload, attention, emotion, fatigue
ECG Cardiac electrical activity Stress, autonomic arousal
PPG / BVP Peripheral blood volume pulse Heart rate variability → stress and load
Eye Gaze & Pupillometry Gaze position, pupil diameter Workload, situational awareness
Speech Acoustic para-linguistic features Stress, arousal, cognitive load