B[FM]2: Brain Foundation Model via Flow Matching with SplitUNet

1Massachusetts Institute of Technology 2KU Leuven

Continuous-time generative pretraining for EEG. Flow matching maps Gaussian noise to continuous multi-channel EEG along a continuous trajectory. Our SplitUNet velocity network operates directly on the unpatched waveform — no patches, tokens, or masking — and its penultimate-layer features drive a linear head for downstream clinical tasks.

Abstract

EEG foundation models can learn generalizable representations from large-scale EEG corpora to enable single-backbone transfer across diverse clinical and brain-computer interface tasks. Existing models typically discretize the continuous multi-channel EEG waveform into patches or codebook tokens and train a transformer with masked self-supervision.

Recognizing that this discretization fragments continuous brain rhythms and obscures fine-grained temporal dynamics, we present B[FM]2 (Brain Foundation Model via Flow Matching), whose inductive bias aligns with the data by pretraining directly on the raw signal using continuous-time flow matching without patches, tokenization, or masking. However, multi-channel EEG signals pose an architectural challenge for flow matching: time is densely sampled and highly autocorrelated (thousands of timepoints), while the electrode axis is short (tens of channels) at distinct scalp positions. To address this time–electrode asymmetry, we introduce SplitUNet, a velocity network that factorizes each block into separate 1D temporal and 1D electrode convolutions and downsamples only along time, preserving electrode topology throughout the hierarchy.

B[FM]2 sets a new state of the art on 7 of 9 standard downstream EEG classification tasks, using a pretraining budget of only 36,895 segments (≈307 h), 1–2 orders of magnitude (≈30×) less than required by existing EEG foundation models. Further, it generates synthetic EEGs that two board-certified neurologists cannot distinguish from brain data (Cohen's κ = −0.096).

Highlights

  • Tokenization-free pretraining. A continuous-time flow-matching pretext that operates directly on the raw multi-channel waveform — no patches, codebooks, or masking.
  • SplitUNet. A velocity network that factorizes every spatiotemporal convolution into a 1D temporal and a 1D electrode convolution, downsampling only along time and preserving electrode topology end-to-end.
  • State of the art, efficiently. New best on 7 of 9 downstream EEG tasks while pretraining on only 36,895 segments (≈307 h) — 1–2 orders of magnitude (≈30×) less data than prior EEG foundation models.
  • Clinically indistinguishable generation. Two board-certified neurologists could not reliably tell B[FM]2 samples from real held-out clinical EEG (Cohen's κ = −0.096).

SplitUNet Architecture

Encoder stages halve only the time axis (electrode dimension preserved); a self-attention bottleneck mixes globally; the decoder mirrors the encoder with time-only upsampling. Each Conv(1+1)D factorizes a 2D convolution into a temporal Conv1D followed by an electrode Conv1D.

Downstream Results

Balanced accuracy (mean over 5 seeds) across the nine downstream tasks, comparing B[FM]2 against EEG foundation models. Best per task in bold. B[FM]2 sets a new state of the art on 7 of 9 tasks and the highest suite average, despite using far less pretraining data. Full results, including task-specific supervised baselines and companion metrics, are in the paper.

MethodMumtazMATSienaISRUCHMC TUEVTUABBCIC-IV-2aSEED-VAvg.
BIOT 0.9360.6880.7350.7530.686 0.5280.7960.4750.3840.664
LaBraM-Base 0.9410.6910.7080.7630.728 0.6410.8140.4870.3980.686
CBraMod 0.9560.7260.7320.7860.727 0.6670.8290.5140.4090.705
REVE 0.9640.7660.7400.7820.740 0.6760.8320.6400.4050.727
CSBrain 0.9640.7560.7660.7920.735 0.6900.8170.5660.4200.723
B[FM]2 (ours) 1.0000.8400.7760.8060.764 0.7150.8190.5700.4920.754

Real vs. Generated EEG

A 30-second held-out TUEG segment (middle) and an unconditional B[FM]2 sample (right) in the 19-channel 10–20 montage. Both exhibit the spatial and temporal coherence characteristic of clinical EEG and are difficult to distinguish. In a blinded reading of 50 interleaved segments, two board-certified neurologists could not reliably separate real from generated EEG (Cohen's κ = −0.096, i.e., agreement no better than chance).

Physiologically Diverse Generations

Unconditional B[FM]2 samples reproduce clinically meaningful patterns — including sharp-wave transients, slow-wave activity, and posterior α rhythm — not just average spectra.

BibTeX

@article{hwang2026bfm,
    author  = {Hwang, Jaedong and Zhang, Kathleen and Dai, Wei and Kontras, Konstantinos
               and Vanmarcke, Maarten and De Vos, Maarten and Fiete, Ila and Liang, Paul Pu},
    title   = {B[FM]$^2$: Brain Foundation Model via Flow Matching with SplitUNet},
    journal = {arXiv preprint},
    year    = {2026},
}