Generative AI fills in the gaps in microscopy data to further genetic medicine
June 04, 2025
subscription
Image. Measuring distances between genes. Credit: Generated with DDG DaVinci2 model from prompt by Nicolas Posunko/Skoltech PR

Skoltech researchers have enlisted generative artificial intelligence to complete the missing data on the distances between pairs of genes in DNA. This enables figuring out the 3D architecture of DNA molecules, which is in turn necessary for developing treatments and diagnostic approaches for genetic diseases. Published in the journal Scientific Reports, the study is the first successful attempt to flesh out such data using AI or, in fact, by any means. Previously, scientists had to make do with incomplete data, hampering progress in medical genetics and limiting the scientists’ understanding of the biophysics of chromatin — the stuff of chromosomes.

To do its job properly, DNA requires more than the right set of genes: It has to have the correct 3D architecture, which is traditionally the object of statistical physics, and polymer physics in particular. The way the 46 long DNA macromolecules per cell are folded in space affects which genes are active and whether the cell will reproduce appropriately and differentiate into specialized cell types during embryonic development. Conversely, faulty DNA architecture plays a role in the development of abnormalities and diseases, such as cancer.

The more scientists learn about the physical principles behind the stabilization of the “healthy” 3D architecture of DNA, the more opportunities for diagnosing and treating genetic disorders are created. By comparing DNA spatial structure in health and disease, biomarkers for diagnosing disorders and personalized treatments can be found. Scientists can identify new therapeutic targets, develop drugs that restore normal gene function, and design precise gene editing interventions.

One of the most widely used experimental techniques for examining how DNA molecules are folded in space is fluorescence microscopy. This refers to a kind of optical microscopy where certain specific gene sequences — a great number of those, in fact — are highlighted by staining them with fluorescent tags.

The problem is that such data is inevitably fragmentary. To attach a fluorescent tag, scientists synthesize a short gene sequence that is complementary to the sequence at the position of interest along the DNA strand. However, it’s not possible for every sequence. If it contains repeated nucleobases, such as a string of letters A, for example, the sequence cannot be stained selectively, because it is not unique. So researchers have had to make do with incomplete data. Not anymore.

“Once you know the distances between a sufficient number of genes, determining the remaining distances for which there is no experimental data takes the form of a mathematical problem with a specific solution,” the principal investigator of the study, Assistant Professor Kirill Polovnikov from Skoltech Neuro, commented. “We have shown for the first time that generative models are capable of solving such problems. This is an unconventional application of the kind of AI usually employed for more ‘creative’ tasks — generating images and text based on a user prompt. At the same time, this is a new approach to the study of chromatin structure, where polymer physics has historically reigned supreme.”

The implications of the research are twofold. Practically speaking, the Skoltech team has proposed and tested a way to process fluorescent microscopy data that will ultimately enable a better understanding of DNA spatial structure, which promises better treatments and diagnostics for genetic diseases. Fundamentally, the study demonstrates the potential of generative artificial intelligence beyond the usual scope of its applications.

The study reported in this story was supported by Russian Science Foundation Grant No. 25-13-00277.