The origins of Synthesize Bio (a founder's story)
Less than 2 years ago, I ran into my colleague Rob Bradley in the hall. At the time, I was newly back in Seattle at Fred Hutch after spending more than a decade on the East Coast at Johns Hopkins.
I had known Rob for years - I’d always been impressed by his careful science and his creativity in understanding the translational impacts of splicing. We had connected at a UW Genome Sciences faculty retreat a few years before I came back and I was excited to work with him in our new leadership capacities.
As we were chatting in the hall, Rob mentioned that he had been thinking about ways to make genomic data analysis easier for people. He had been developing a platform for easy integration and analysis of gene expression data and had used our recount project as some of the test data for his new platform.
This was right at the beginning of the AI explosion over the last few years and I’d been thinking about how LLMs were created - scraping the web for text data, processing it, cleaning it, then creating models that would allow people to generate new text from prompts.
I proposed what seemed (at the time) to be a crazy idea to Rob - what if we could assemble the world’s gene expression data and build a model that could generate new - or even impossible - experiments directly from descriptions of the experimental design.
The analogy to LLMs was very clear, that the combination of comprehensive training data and rich algorithms could produce emergent behavior. If we were right, scientists could generate experiments with a simple description and a call to a model, saving hundreds or thousands of hours and unleashing scientific creativity.
Instead of laughing me out of the room, Rob suggested we build a prototype. We worked with the recount data and reformatted them as heatmap images.
With lots of kluges and hacks, and fed them to an open source image diffusion model. The result was….something that looked, unbelievably, like real gene expression data from a new experiment.
Rob and I took our prototype and showed it to Matt McIlwain, Chris Picardo, and Joe Horseman from Madrona. We told them that it was very early days, but we thought we might be able to build models that would generate new expression data based solely on experimental designs for a very wide range of conditions. This idea of generative genomics and the potential to radically transform experiments ranging from early discovery to clinical stage studies, had us all excited.
In very short order, Rob and I had founded Synthesize Bio and raised the seed round which we just announced last week, supported by Madrona, Sahsen Ventures, Inner Loop Capital, Point Field Ventures, and the AI2 Incubator.
Since then, we’ve recruited an incredible team including leading data/computational scientists, platform engineers, and AI scientists to build the vision of generative genomics. We just released our first preprint describing the first generation of our models called GEM-1. This model is a pretty big improvement on the prototype Rob and I hacked together (that’s an understatement!) but it is focused on the same generative genomics concept - how can we enable scientists to generate data from experiments they haven’t done yet?
Along the way we’ve built a set of tools that make these models easy to use and the results easy to contextualize. You can generate data directly on our platform or using our R and Python libraries. Using our platform you can directly integrate those data with thousands of publicly available data processed using the same pipeline. So you can test/validate and explore AI generated data along with experimental data.
When Rob and I founded Synthesize the idea of generative genomics was just an idea. Our first generation of models show the hints of what is possible in generating new experimental data, just like the early GPT models showed what was possible from models that generated text. We are excited for you to see what we have been building and we’ve opened the AI model generation features to everyone in the community. Try the models - on our platform or with our packages; share notebooks with friends, try it as a teaching tool - and let us know what you think because we are just getting started.



