Skip to content

Preprocessing Pipeline

Prior to data entering the brain foundation model, raw sensor signals must be transformed into a standardised, denoised version of the data. This pipeline is characterized by the following stages, which are modular and replaceable:

  1. Stage 1: Artifact Rejection and Denoising
  2. Stage 2: Bandpass Filtering
  3. Stage 3: Windowing and Normalisation

Artifact Rejection and Denoising

Raw physiological signals are corrupted by a variety of artifacts:

EEG artifacts:

  • Ocular artifacts: eye blinks and saccades produce large slow potentials at frontal electrodes. Removed via ICA or regression-based eye-movement correction.
  • Muscle artifacts: electromyographic noise from jaw clenching or facial movements. Addressed by high-frequency filtering and ICA.
  • Motion artifacts: electrode movement during physical activity. Particularly relevant for WAUC.
  • Channel dropouts: individual electrodes with poor contact produce flat or saturated signals. Detected and interpolated from neighbouring channels.

PPG artifacts:

  • Motion artifacts from wrist movement dominate PPG noise. Accelerometry data from the same wristband is typically used in adaptive filtering to subtract motion-correlated components.

Bandpass Filtering

EEG signals are filtered to retain clinically and cognitively meaningful frequency bands:

Band Range Cognitive Association
Delta 0.5 – 4 Hz Deep sleep, unconscious processing
Theta 4 – 8 Hz Cognitive load, working memory, frontal theta
Alpha 8 – 13 Hz Relaxed wakefulness; suppressed during tasks
Beta 13 – 30 Hz Active concentration, motor activity
Gamma 30 – 100 Hz High-level processing, binding

For cognitive workload monitoring, theta (frontal) and alpha (occipital) bands are most informative.

Windowing and Normalisation

Windowing

Continuous recordings are segmented into overlapping windows. Window length governs the trade-off between low-frequency frequency resolution and cognitive state stationarity. Typical ranges: 2–10 seconds with 50% overlap.

Normalisation

Each window is independently normalised to remove slow drift and inter-subject amplitude differences:

  • Per-channel z-score normalisation within each window.
  • Robust normalisation using median absolute deviation to resist outlier contamination.
  • Riemannian covariance-matrix normalisation, which is inherently scale-invariant and hardware-agnostic.