★ VENTURE TAKES

Biohub and the Dream of a Virtual Cell

If virtual cell and protein world models work, we could move from reactive sick care to predictive medicine: detecting instability earlier, testing interventions virtually, and understanding disease as a system-level drift before it becomes visible damage.

1P · JUDY DUONG·MAY 28, 2026·10 MIN READ

Most medicine today is still reactive.

We wait until something looks wrong, measure the damage, name the disease, then try to treat it. But biology rarely breaks in one dramatic moment. More often, the system drifts: inflammation rises, signalling pathways lose rhythm, cells adapt to stress, and only later do we see what we call disease.

That is why Biohub’s AI biology push is so interesting.

Biohub, backed by Mark Zuckerberg and Dr. Priscilla Chan, is shifting deeper into AI-powered biology, with the long-term goal of building predictive models of human biology. In 2026, Biohub committed $500 million over five years to advance AI models of the human body, including efforts to build more accurate predictive models of cells.

The big idea is simple but wild:

What if we could simulate biology before running every experiment in the lab?

From mapping biology to understanding its logic

Over the past decade, biology has become very good at mapping. We have mapped genomes, sequenced cells, identified cell types, and built huge biological datasets.

But mapping is not the same as understanding.

A cell is not just a static object with a label. It is a living system responding to chemical, mechanical, electrical, spatial, and temporal signals. When the environment becomes noisy, stressed, or dysregulated, cells adapt. Sometimes what we call pathology may be the downstream signature of a system that has lost its regulatory rhythm.

This is where virtual cell models matter.

An AI Virtual Cell Model (AIVC) is a computational, AI-driven digital simulation that mimics the behavior and inner workings of a biological cell. By training neural networks on massive biological datasets, these models attempt to predict how a cell will respond to changes, diseases, or medical treatments without needing physical lab experiments.

A virtual cell is not just a pretty digital twin. It is an attempt to model how cells behave, respond, and transition between states. Recent research describes virtual cell modelling as a way to predict cellular responses to perturbations, but also notes that existing models struggle with data quality, coverage, batch effects, interpretability, and biological consistency.

In plain English: we are no longer only asking “what is this cell?”
We are asking “what is this cell likely to do next?”

Biohub’s protein world model

Biohub’s newest work focuses on a “world model” of protein biology. The release includes a protein-structure prediction model, a protein language model, and the ESM Atlas, which maps 6.8 billion proteins and 1.1 billion predicted structures. Biohub says these models can help scientists test ideas computationally before moving into the lab.

That matters because proteins are basically the tiny machines of life. They build, signal, regulate, repair, and interact. If you can better predict how proteins behave and bind, you can potentially speed up drug discovery and therapeutic design.

Reuters reported that Biohub’s model is based on fourth-generation evolutionary scale modelling, or ESM, and has been tested in immune disease and cancer contexts, including designing protein binders that could reactivate immune cells in lab tests. The models are open-source and will be available through Biohub’s platform, AWS Bio Discovery, and SandboxAQ, with compute credits for researchers.

That open-source point is important. Biohub is not just building a closed pharma tool. It is trying to build shared scientific infrastructure.

Why early detection is the real leverage

The most exciting implication is not only faster drug discovery. It is earlier intervention.

A healthy system is not defined by the absence of failure. It is defined by the ability to detect small deviations and correct them before they compound.

This is where biology starts to look like an information system.

If AI can continuously model changes across proteins, cells, tissues, immune signals, inflammation, metabolism, and disease progression, then healthcare can move from:

“You are sick, let’s treat it.”

to:

“Your system is drifting, let’s intervene early.”

That is a huge shift.

It means early detection becomes a continuous sensing challenge, not just a one-time diagnostic test. The real power is in tracking subtle changes spatially, temporally, and contextually — before the system tips into a less reversible state.

The harder problem: not all states are viable

One challenge in biological prediction is that not every cellular state is equally stable.

Some cell states may look fine for a while but are structurally unstable. Others may be adaptive and reversible. The real breakthrough is not just predicting behaviour, but understanding which transitions are biologically admissible, which are unstable, and which suggest the system is approaching a tipping point.

That distinction matters for disease.

If AI can identify when a cell or tissue is moving from a resilient state into a fragile one, medicine can intervene much earlier. That is the dream: not just diagnosing disease, but spotting the biological pre-disease drift.

Why this is still hard

This is not magic. Biology is messy.

Cells are shaped by genes, proteins, tissue context, immune activity, mechanical forces, time, environment, and feedback loops. The same signal can mean different things in different contexts. Data is fragmented across imaging, transcriptomics, proteomics, genomics, sensors, and clinical records.

A CZI Virtual Cells workshop paper highlighted major bottlenecks for AI biology: data heterogeneity, noise, reproducibility issues, biases, fragmented public resources, and the need for better benchmarks across biological tasks and data modalities.

So the bottleneck is not just model size. It is data quality, experimental validation, interpretability, and whether predictions actually hold up in wet labs and eventually in patients.

Very humbling. Biology does not care about your GPU count.

Why Biohub is worth watching

Biohub is interesting because it sits in a rare position: it has philanthropic capital, scientific ambition, access to AI talent, and a willingness to build open research infrastructure. Chan and Zuckerberg have shifted the bulk of their philanthropy toward Biohub and AI-powered biology, with the goal of using virtual cell models to understand disease and translate research into human medicine.

They are also treating compute as core lab infrastructure. Zuckerberg and Chan have said researchers increasingly want GPUs, not just more lab space, and Biohub reportedly aims to expand from around 1,000 GPUs to 10,000 GPUs by 2028.

That tells you where biology is going.

The next generation of labs may not only be wet labs. They may be hybrid systems: cells, microscopes, robots, models, sensors, datasets, and compute loops feeding each other constantly.

My takeaway

Biohub’s work is compelling because it reframes biology.

Not as a collection of isolated diseases.
Not as a static map of genes and cell types.
But as a living information system evolving over time.

That is the real paradigm shift.

We mapped the parts.
Now we need to understand the logic.

And if Biohub can help scale that logic into usable models, AI biology may become one of the most important frontiers of the next decade.

P/S: Please go check their latest world model of protein biology here. I’m not a biology expert, never formally studied it. And honestly it took me so much time to unpack the terminology and the research preview. But I find scientific breakthroughs and bold research initiatives beautiful, and always worth following!

#BIOTECH#BIOHUB#EVOLUTIONARY SCALE MODELS #PROTEIN LANGUAGE MODEL#AI