B[FM]2: Brain Foundation Model via Flow Matching with SplitUNet

B[FM]²: Brain Foundation Model via Flow Matching with SplitUNet

¹Massachusetts Institute of Technology ²KU Leuven

Abstract

EEG foundation models can learn generalizable representations from large-scale EEG corpora to enable single-backbone transfer across diverse clinical and brain-computer interface tasks. Existing models typically discretize the continuous multi-channel EEG waveform into patches or codebook tokens and train a transformer with masked self-supervision.

Recognizing that this discretization fragments continuous brain rhythms and obscures fine-grained temporal dynamics, we present B[FM]² (Brain Foundation Model via Flow Matching), whose inductive bias aligns with the data by pretraining directly on the raw signal using continuous-time flow matching without patches, tokenization, or masking. However, multi-channel EEG signals pose an architectural challenge for flow matching: time is densely sampled and highly autocorrelated (thousands of timepoints), while the electrode axis is short (tens of channels) at distinct scalp positions. To address this time–electrode asymmetry, we introduce SplitUNet, a velocity network that factorizes each block into separate 1D temporal and 1D electrode convolutions and downsamples only along time, preserving electrode topology throughout the hierarchy.

B[FM]² sets a new state of the art on 7 of 9 standard downstream EEG classification tasks, using a pretraining budget of only 36,895 segments (≈307 h), 1–2 orders of magnitude (≈30×) less than required by existing EEG foundation models. Further, it generates synthetic EEGs that two board-certified neurologists cannot distinguish from brain data (Cohen's κ = −0.096).

Highlights

Tokenization-free pretraining. A continuous-time flow-matching pretext that operates directly on the raw multi-channel waveform — no patches, codebooks, or masking.
SplitUNet. A velocity network that factorizes every spatiotemporal convolution into a 1D temporal and a 1D electrode convolution, downsampling only along time and preserving electrode topology end-to-end.
State of the art, efficiently. New best on 7 of 9 downstream EEG tasks while pretraining on only 36,895 segments (≈307 h) — 1–2 orders of magnitude (≈30×) less data than prior EEG foundation models.
Clinically indistinguishable generation. Two board-certified neurologists could not reliably tell B[FM]² samples from real held-out clinical EEG (Cohen's κ = −0.096).

Downstream Results

Balanced accuracy (mean over 5 seeds) across the nine downstream tasks, comparing B[FM]² against EEG foundation models. Best per task in bold. B[FM]² sets a new state of the art on 7 of 9 tasks and the highest suite average, despite using far less pretraining data. Full results, including task-specific supervised baselines and companion metrics, are in the paper.

Method	Mumtaz	MAT	Siena	ISRUC	HMC	TUEV	TUAB	BCIC-IV-2a	SEED-V	Avg.
BIOT	0.936	0.688	0.735	0.753	0.686	0.528	0.796	0.475	0.384	0.664
LaBraM-Base	0.941	0.691	0.708	0.763	0.728	0.641	0.814	0.487	0.398	0.686
CBraMod	0.956	0.726	0.732	0.786	0.727	0.667	0.829	0.514	0.409	0.705
REVE	0.964	0.766	0.740	0.782	0.740	0.676	0.832	0.640	0.405	0.727
CSBrain	0.964	0.756	0.766	0.792	0.735	0.690	0.817	0.566	0.420	0.723
B[FM]² (ours)	1.000	0.840	0.776	0.806	0.764	0.715	0.819	0.570	0.492	0.754

BibTeX

@article{hwang2026bfm, author = {Hwang, Jaedong and Zhang, Kathleen and Dai, Wei and Kontras, Konstantinos and Vanmarcke, Maarten and De Vos, Maarten and Fiete, Ila and Liang, Paul Pu}, title = {B[FM]$^2$: Brain Foundation Model via Flow Matching with SplitUNet}, journal = {arXiv preprint arXiv:2606.20812}, year = {2026}, }

B[FM]²: Brain Foundation Model via Flow Matching with SplitUNet

Abstract

Highlights

SplitUNet Architecture

Encoder stages halve only the time axis (electrode dimension preserved); a self-attention bottleneck mixes globally; the decoder mirrors the encoder with time-only upsampling. Each Conv(1+1)D factorizes a 2D convolution into a temporal Conv1D followed by an electrode Conv1D.

Downstream Results

Real vs. Generated EEG

Physiologically Diverse Generations

Unconditional B[FM]² samples reproduce clinically meaningful patterns — including sharp-wave transients, slow-wave activity, and posterior α rhythm — not just average spectra.

BibTeX

B[FM]2: Brain Foundation Model via Flow Matching with SplitUNet

Abstract

Highlights

SplitUNet Architecture

Encoder stages halve only the time axis (electrode dimension preserved); a self-attention bottleneck mixes globally; the decoder mirrors the encoder with time-only upsampling. Each Conv(1+1)D factorizes a 2D convolution into a temporal Conv1D followed by an electrode Conv1D.

Downstream Results

Real vs. Generated EEG

Physiologically Diverse Generations

Unconditional B[FM]2 samples reproduce clinically meaningful patterns — including sharp-wave transients, slow-wave activity, and posterior α rhythm — not just average spectra.

BibTeX

B[FM]²: Brain Foundation Model via Flow Matching with SplitUNet

Unconditional B[FM]² samples reproduce clinically meaningful patterns — including sharp-wave transients, slow-wave activity, and posterior α rhythm — not just average spectra.