Understanding GEM-1’s adaptability through the art of Vincent van Gogh
How reference conditioning enables Synthesize’s Generative Genomics Engine to make accurate predictions - even in experimental settings the model has never encountered
We are inviting partners to leverage one of GEM-1’s most powerful capabilities: predicting the effects of experimental perturbations on your own proprietary data. (If you are interested in piloting this, contact our partnerships team!)
Prior to its launch earlier this year, our GEM-1 generative genomics engine model was hard at work - training on hundreds of thousands of bulk RNAseq and tens of millions of single-cell RNA-seq profiles - to predict the effects of an experimental perturbation across a broad range of biological contexts.
Perhaps not surprisingly, the model performed remarkably well when presented with cell types and disease states it had previously encountered in training (read our preprint to learn more).
What was even more impressive was that the model accurately predicted the effects of many perturbations it hadn’t seen during training! This good performance was made possible by a technique that we developed called reference conditioning. Reference conditioning enables GEM-1 to predict the outcome of a perturbation in a new context nearly as robustly as it does when working on training data. That means scientists can easily use GEM-1 for their own area of focus, using data they already have in hand.
To understand reference conditioning, let’s start by exploring an analogous concept in image generation. Suppose you want to see how Vincent van Gogh might have painted a portrait of himself wearing a straw hat (which he actually did on multiple occasions)...

We’ve seen this with GEM-1 too. Conditioning a query with a reference sample produces a more realistic result. We can capture nuance about the impact of a drug, disease or other perturbation that would be difficult to achieve using prompt-only generation.
Breaking new ground with reference conditioning
And that’s where things get interesting. Confident in our approach, we can now apply reference conditioning to examine things that don’t exist, but very well could. For example, we can create a realistic painting that Van Gogh never (as far as I know) created himself by prompting with the word “pitchfork” and a different self-portrait.


This works because an image model has seen many portraits with pitchforks in training. It can make a decent guess at what this “pitchfork perturbation” would look like in a wide variety of contexts. And notice in this example we see a hand is implicitly “co-expressed,” which indicates the model has enough knowledge about pitchforks to put this object in the proper context.
A model for many contexts
Just as the image generation model can convincingly perturb a Van Gogh based on a single example, GEM-1’s experience with millions of diverse cell line and tissue samples enables it to accurately predict the effects of an experimental perturbation when given a new reference sample to work with.
In the current landscape of foundation models, customization typically requires fine-tuning (adding layers to the model and retraining it on substantial new datasets). GEM-1 offers a distinct alternative. Because reference conditioning is intrinsic to our model’s architecture, it bypasses the need for retraining entirely. This allows for immediate, ‘N=1’ personalization. You can input a single proprietary sample, such as a specific patient's tumor, and immediately simulate the effects of treatment. While fine-tuning remains a valid path for large-scale adaptation, reference conditioning offers a speed and data-efficiency advantage that is currently unique in the biological foundation model space.
The technical details of GEM-1’s reference conditioning vary quite a bit from how image models implement these kinds of transformations. See the GEM-1 preprint for further information about our implementation and evaluation of this feature.
Reference conditioning is not offered through our open platform and is available only via collaboration. If you’d like to see what GEM-1 can do with your own data, please get in touch with us (partnerships@synthesize.bio)!

