As observing biological systems becomes cheaper and more automated, I see an opportunity to build increasingly complex models of life. Here is an idea to accomplish that using adversarial artificial intelligence agents, a strategy based on the concept of generative adversarial networks.

The setup

Scientists would set up a sandbox experimental system with components such as a chemostat, a DNA sequencer, a microscope, and any other equipment that can be automated (perhaps by little robots). An API is connected to the equipment embedded in the system to both control and measure the system in different ways. The system could contain things like a strain of cells, a microbial community, or even whole multicellular organisms such as Arabidopsis.

An AI agent, the discriminator, can perturb and measure the biological system using this API. In a series of tests, it is presented with an API that connects either to the real system or to an imitation managed by another AI agent, the generator. After sending commands to manipulate the system and receiving measurements as feedback, the discriminator tries to determine whether it’s dealing with the real thing. Its guesses and their accuracy tell the generator how to improve its imitation (in other words, how to make a more accurate model of nature in this context).

Diagram of setup with adversarial AI

What can we model with it?

The model would be able to predict the outcome of any experiment that we could include in the sandbox system. As long as we can automate an observational technique, the discriminator could learn what it should observe when it does certain things to the system, and the generator had better make its simulation do those things too. In other words, I’m not sure how we could construct a more comprehensive model than this system would without building it up from more fundamental physical principles, which I don’t see happening any time soon.

Microscopy would teach the agents not only the shape of cells and their components, but also the location of genomic features using fluorescence probes.

DNA sequencing would tech the agents how cell populations changed over time. For example, a system containing tumor cells in a large array could allow for the discriminator to introduce specific mutations and observe the dynamics of variant subpopulations within the tumor.

A single type of observation would be easy for the agent to learn, but useless. For example, if the only system input was temperature control, and the only output was measuring cell growth, the agents could quickly learn the relationship curve without any insight as to why or how the two are connected. But when there are many different inputs and outputs, and the simulation must model all of them correctly, its inner workings should start to mimic those of real biological systems.

What do we do with the model?

Eventually we should have a very complicated model that a very powerful AI can’t distinguish from nature. How do we extract knowledge? One way would be to force the model into simplified knowledge structures. For example, if we wanted a protein interaction network, we would somehow query the model with all pairs of proteins and ask it whether they interact. I’m not sure whether such tools for model interpretation exist yet.

But even if we can’t do that and the model is a total black box, it would still be useful because we could treat it as a simulation of nature over which we have much more control. We could slow down or speed up time and do other things that would be infeasible, expensive, or time-consuming for the real system.

Diagram of uses for the trained system

Importantly, the simulations would be designed to be deterministic so that after making an observation, we could rerun the simulation and do a different measurement, perhaps earlier or later, which might be impossible in real life due to one of the measurements affecting the system in a way that prohibits the second. Imagine being able to repeatedly dissect a cell at one or many time points to see how its components change and affect each other. The simulation may only approximate what would happen in nature, but the same is true for real experiments!

First steps

This adversarial technique would not be especially difficult to set up. The main hurdle would be to assemble all the perturbation and detection equipment into one system and have it all controllable via an API. While even single experimental techniques currently can require endless fidgeting by skilled technicians, the required automation is something that is both desirable and necesssary anyway for biology to advance. The remaining components are just the AI agents running on capable hardware, which can be remote.

We may not be able to implement this technique to the point of creating useful models yet. But it can be prototyped in stages, and today’s technology would allow basic models to be learned. Those models could be used to run very domain-specific simulations, and over time the simple models would grow and converge. With many of the -omes measured, current research often aims to collect and merge different data types to gain new knowledge, but perhaps eventually the integration will have to happen in the experiment itself.