Clear Sky Science · en
2D Multimodal Image Collection for Fluorescence Prediction from Transmitted Light Microscopy
Seeing cells without heavy preparation
Modern biology often relies on glowing dyes to reveal what is happening inside living cells, but this comes at a cost in time, money, and cell health. This article presents the Light My Cells database, a large public collection of microscope images designed to help computers learn to recreate those glowing views from gentler, label free images. For anyone interested in how artificial intelligence can reduce the need for chemical stains and still show the inner life of cells, this work lays the groundwork.

Why glowing cells are both useful and risky
Fluorescence microscopy lets scientists tag specific parts of a cell so they light up, making structures like the nucleus or mitochondria easy to track. However, preparing samples with fluorescent dyes is labor intensive, can be expensive, and exposes cells to light that can fade the signal or even harm them. These problems grow in long experiments or large screening projects, where thousands of images must be taken. In contrast, simple transmitted light techniques, such as bright field or phase contrast, are gentle and label free, but they do not directly reveal which structures are which. The central idea behind Light My Cells is to bridge this gap by training computers to infer fluorescent like images from these simple, non damaging views.
A nationwide collection of diverse cell images
To make this possible, imaging experts across France joined forces to build a rich, shared dataset. The Light My Cells database gathers 56,984 two dimensional images grouped into 2,574 matched sets, contributed by eight imaging centers and 30 independent studies. Each set shows the same field of living mammalian cells first with transmitted light and then with one or more fluorescent labels that highlight the nucleus, mitochondria, tubulin, or actin. The images were collected on a wide range of microscopes and with many sample types, capturing the variation that real laboratories encounter every day. This diversity is crucial for teaching deep learning models that can handle different instruments, cell lines, and acquisition conditions instead of overfitting to a single, tidy setup.

How the images were standardized for computers
Because the data came from many sites, the team built a careful preparation pipeline before releasing the collection. All original files, produced in many proprietary microscope formats, were converted into a common, open format called OME TIFF that stores both the picture and detailed information about how it was taken. Contributors filled in rich metadata templates describing the sample, the light path, the objectives, and the labeling strategy, following community guidelines for reusable imaging data. For each stack of images taken at different depths, algorithms automatically chose the best focused slice, using one method tuned for transmitted light and another tuned for fluorescent signal. While all transmitted light slices were kept, each fluorescent channel was reduced to a single sharp plane, matching the typical learning task of predicting one well focused fluorescent view from label free input.
What the database contains and how quality was checked
The final resource includes over fifty thousand transmitted light images, mostly bright field but also phase contrast and differential interference contrast, plus more than four thousand paired fluorescence images. The nucleus and mitochondria are well represented, while tubulin and actin appear less often, creating natural class imbalance that users must consider when training models. Each study in the archive is documented with structured descriptions of the biological model, imaging hardware, and acquisition settings, so users can filter by context or compare conditions. The authors also performed technical checks to remove corrupted files, verify that metadata fields were complete, and confirm that the chosen focus planes matched expert judgment. Test scripts ensured that common tools such as ImageJ, Napari, and standard Python libraries can easily open and process the images.
How researchers can use this open resource
Beyond its original use in a deep learning challenge, the Light My Cells database is intended as a general testbed for methods that translate or analyze label free images. The paired nature of the data makes it suitable for tasks such as predicting fluorescence from transmitted light, segmenting cell structures, or profiling cell states without extra dyes. Because the transmitted light stacks are preserved, researchers can also explore models that use depth information or focus estimation. All data and preparation code are openly available under permissive licenses, inviting others to build on the pipeline, extend the dataset, or benchmark new algorithms.
What this means for future cell imaging
For non specialists, the key message is that Light My Cells provides the raw material needed to teach computers to see inside cells using gentler forms of microscopy. Instead of always adding glowing labels and risking damage, scientists may increasingly rely on smart software trained on collections like this to reveal where key structures lie. The database does not solve fluorescence free imaging on its own, but it makes high quality, well documented examples available to everyone, speeding progress toward less invasive and more scalable ways of watching living cells in action.
Citation: Kauffmann, D., Gay, G., Mateos-Langerak, J. et al. 2D Multimodal Image Collection for Fluorescence Prediction from Transmitted Light Microscopy. Sci Data 13, 743 (2026). https://doi.org/10.1038/s41597-026-07004-w
Keywords: fluorescence microscopy, transmitted light imaging, deep learning, bioimage database, in silico labeling