Beyond structure-based biomolecule design

Dynamics, black-box data, and the antedisciplinary frontier of biomolecule design

Nov 15, 2025

Isomorphic Labs is founded, AlphaFold 3 has a very different release strategy than previous iterations, and there now exist multiple biotech (techbio?) startups building very similar protein structure foundation models.

It feels like an important moment for structure-based biomolecule design: the models are actually starting to work, have perceived commercial value, and the action is naturally shifting from academia to industry.

So what are the next scientific problems academia could be thinking about?

I’m excited about two interconnected directions that go beyond static structures, united by a fundamental insight: Nature performs computation through transitions between molecular states, often coupled to chemical reactions and dynamic intermolecular interactions1.

The first direction is to directly model dynamics. The second is to bypass explicit dynamics using black-box functional data, which captures the results of dynamics.

Representing and designing conformational dynamics

Conformational changes and dynamics are fundamental to the function of almost every biologically relevant molecule, from antibodies and membrane receptors to biocatalysis in proteins and RNA.

An ideal computational representation of molecular systems must therefore account for both geometric structures and temporal dynamics. However, almost all the success stories in biomolecule design today have focussed on static representations2. So how can we better represent multi-state ensembles and transition dynamics of conformational changes?

First, the integration of machine learning interaction potentials (MLIPs) for molecular dynamics with property prediction and generative models. Recent ‘universal’ MLIPs have demonstrated remarkable accuracy in approximating quantum mechanics calculations (including biomolecules3).

An interesting question is whether representations learned by MLIPs can be predictive of functional properties of de novo designed molecules (beyond natural systems seen during training). If true, MLIPs could enable new capabilities in molecular design, including dynamics-informed generation via conditioning, or accelerating the screening of generated designs with desired ensemble properties.

Second, the use of experimental data that explicitly captures conformational flexibility and dynamics during training. For example, cryo-EM density maps from structure determination methods, or high-throughput structural assays such as cross-linking mass spectrometry for proteins, and chemical probing for RNA, which provide complementary information about structural flexibility, not just static 3D structures.

We must move beyond solely static structures if we want to understand the dynamic behaviour of biomolecules, or be able to reliably design functional, multi-state systems. This movement (pun intended) starts with the training data.

Alternatively, could we leap directly from sequence to dynamical function?

Black-box datasets and lab-in-the-loop design

Structure-based design cannot be a universally applicable paradigm. Its starting to work very well when there is an established structural basis for function, and when high-quality structural data is available. But many of the most interesting biological problems may not fit this mould.

There is growing excitement about ‘black-box’ experimental datasets from high-throughput assays. Essentially, this approach aims to directly map sequence to a functional output at a very large scale, often without relying on a structural basis of function.

This is an interesting change in experimental mindset: instead of generating few, high quality data points for human scientists to better understand a phenomenon, create large-scale, relatively noisier datasets specifically for training machine learning models (the black boxes)4.

If we can find the right balance between scaling the experimental assays and signal-to-noise, a lab-in-the-loop setup can be established. We can iteratively generate, test and improve our AI molecular design model through feedback in the physical world.

While structure-based approaches currently capture most of the attention, these black-box active learning systems are comparatively under-explored. Yet, methods for driving this experimental loop can be fundamentally very general and adaptable to a range of molecular design problems.

Antedisciplinary science

The most interesting questions in molecular biology involve dynamic biological processes. And studying them requires thinking about both the experimental data generation and machine learning models, from the ground up, jointly.

I imagine this will necessitate very close collaborations between AI researchers and experimental biologists. We must learning to speak each other’s language and blur rigid disciplinary lines. This can be slow but ultimately very rewarding.

Such an antedisciplinary approach to science is perhaps going to be essential for asking the most interesting scientific questions, and unlocking the secrets of life.

I felt Hashim Al-Hashimi’s article, Turing, von Neumann, and the computational architecture of biological machines (PNAS, 2023) articulates this idea very well.

Two perspective I really liked on this note are Carugo and Djinović-Carugo’s Structural biology: A golden era, as well as Lane’s Protein structure prediction has reached the single-structure frontier.

MACE-OFF, Orb, and Meta’s UMA are especially interesting for biomolecules.

My views on this are shaped by Deep Screening from MRC LMB (Porebski et al., Nature Biotechnology, 2024), which has since been spun out as Sortera Bio. I also highly recommend reading Bronstein and Naef’s essay The Road to Biology 2.0 Will Pass Through Black-Box Data. In fact, these ideas are particular promising for RNA, where next-generation sequencing can measure structural and functional properties at unprecedented scale and relatively low costs compared to proteins.

Molecular Modeling Club

Discussion about this post

Ready for more?