The Most Important Unsolved Problem Nobody Is Funding

6 minute read

Published:

Everyone in neuroAI agrees that foundation models for the brain would be transformative. So why hasn’t anyone built one, and why isn’t the money following?

Last week I sat in a room with some of the most serious people working at the intersection of neuroscience and AI: researchers from Stanford, Columbia, NYU, UCSF, and Mount Sinai, alongside founders building companies in this space. The event was a symposium on foundation models for the brain, and two hours of panels crystallized something I’d been sensing for a while but hadn’t quite articulated. This field has a structural problem that goes much deeper than technical difficulty.

The thesis of the room was simple. Foundation models—the paradigm that gave us GPT, DALL-E, and AlphaFold—should, in principle, work for the brain. Train a large model on enormous quantities of neural data, let it learn general representations, and then fine-tune for downstream tasks such as seizure detection, drug biomarker discovery, BCI control, or mental health monitoring. The pitch is compelling, the analogy to language and protein modeling is obvious, and the potential applications are enormous.

And yet nobody has cracked it. The funding isn’t flowing. And the deeper you get into why, the more you realize this isn’t just a hard technical problem; it’s a problem with a particular structure that makes it resistant to the ways we normally fund and solve hard problems.

Why the Science Is Harder Than It Looks

Every successful foundation model was built on a canonical unit of representation. For language, it was the token: a discrete, standardized chunk of text that every model processes the same way. For proteins, it was the amino acid. These primitives made it possible to aggregate data across sources, scale training, and compare models meaningfully. They also defined what a self-supervised objective should look like, such as predicting the masked token or the next residue.

Neural data hasn’t converged on a satisfying primitive. A spike train from a Utah array, an EEG channel, an fMRI voxel, or a calcium imaging ROI are all measurements of brain activity, but they differ fundamentally in spatial resolution (single-unit vs. centimeter-scale), temporal resolution (millisecond vs. second), and measurement modality. The field has adopted a de facto workaround: treat EEG as a multi-channel time series, patch it into fixed-length temporal segments, and feed those patches into a transformer as tokens. It’s the BPE tokenization of neuroscience: pragmatic, not principled, and arguably sufficient to get something off the ground.

But the patch-as-token approach inherits a problem that BPE tokenization doesn’t: the underlying channels aren’t stable representational units. Volume conduction through the skull and scalp means each EEG electrode records a spatially blurred superposition of thousands of underlying neural sources. While source localization methods can partially unmix these signals, the inverse problem is fundamentally ill-posed. The same scalp potential distribution is consistent with infinitely many intracranial source configurations. Working in source space helps, but it doesn’t eliminate the ambiguity; it just moves the arbitrariness from electrode placement to the choice of forward model.

The deeper question is whether the representations LaBraM and EEGformer learn encode neurophysiologically meaningful structure, or whether they’re exploiting low-level statistical regularities in the training data, such as recording artifacts or demographic confounds baked into the clinical datasets. The benchmarks used, like seizure classification or sleep staging, are too coarse to distinguish between these possibilities. Current evaluation frameworks can’t tell you if a model is doing sophisticated frequency-band analysis or merely detecting the electrode impedance signatures associated with certain hospital setups.

The Funding Trap

Now layer the funding picture on top of the science. The second panel was explicitly about commercialization and real-world deployment, and what emerged was a structural “doom loop.”

Venture capital needs a product, a product needs a capable model, and a capable model needs data at scale with ground truth labels. Ground truth in neuroscience is elusive in a way that’s genuinely different from other domains. The AlphaFold comparison is instructive but imperfect. The mapping from sequence to fold is essentially deterministic for a given sequence in a given environment, which almost certainly isn’t true for the mapping from neural signal to cognitive state. The same EEG recording produced by the same brain can index wildly different cognitive states depending on context.

Without rigorous benchmarks, it’s hard to demonstrate progress. Without demonstrated progress, it’s hard to raise. Without capital, you can’t acquire the proprietary clinical datasets or compute at the scale needed to make progress. The loop closes.

Where the Leverage Actually Lives

None of this means the field is hopeless. It means the leverage is in different places than people assume. The highest-value contribution right now is probably not building another EEG foundation model trained on the TUH corpus. The results are real but incremental, and the papers tend to benchmark against each other rather than against an external standard of what “understanding the brain” would mean.

The higher-leverage bets are upstream. What is the right inductive bias for a neural data model? Transformers applied to EEG patches treat channels as tokens and time as sequence length, which imposes a specific relational structure. However, there are strong neurophysiological reasons to think that cortical computation involves oscillatory dynamics and traveling waves, none of which are particularly well-captured by standard attention mechanisms operating on fixed patches.

The question nobody is sure how to answer is whether the brain has analogous deep regularities that are accessible to the kinds of models we know how to build. There are good reasons to think yes, as cortical organization follows conserved principles across species and frequency-band dynamics recur across tasks. But there are also good reasons to be skeptical. Neural computation may be fundamentally context-dependent in ways that make population-level regularities shallow rather than deep.

What I keep coming back to is that this is one of the few areas in science where the potential returns for human health and our understanding of cognition are genuinely enormous, yet genuinely underfunded relative to that potential. The question is whether the right incentive structures and evaluation frameworks exist to channel that energy into cumulative progress before the hype cycle exhausts itself and everyone moves on.