Clear Sky Science · en
Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework
Why smart eye imaging matters
Many blinding eye diseases are rare, which makes them hard for doctors and computers to recognize early. This study introduces a new way to create lifelike eye images from simple text descriptions, helping artificial intelligence systems learn from conditions that are seldom seen in clinics. The approach aims to make automated eye screening more accurate and fair for both common and rare retinal diseases around the world.

Turning words into realistic eye images
The researchers built a system called EyeDiff that can generate detailed pictures of the back of the eye and related scans from short written prompts. These prompts describe the imaging method, such as a color photo or a cross sectional scan, together with the disease type and its severity. EyeDiff was trained on more than forty thousand images spanning 14 kinds of eye imaging and over eighty disease categories. By learning how each disease typically looks across different machines and views, the model can produce synthetic images that preserve key disease signs while matching the requested imaging style.
Checking if synthetic eyes look and behave like the real thing
To test whether EyeDiff followed the text instructions, the team used an automated tool that scores how well an image matches its description. Across tasks involving common retinal diseases, diabetic changes, glaucoma and several rare disorders, the scores were high, indicating good alignment between prompts and generated pictures. Two ophthalmologists then took part in a Turing style test in which they had to decide whether each image was real or synthetic. They correctly labeled real images most of the time, but they mistook around two thirds of the generated images for real ones, showing that the synthetic images were convincing to trained experts. When asked to rate how well fifty generated images matched their text prompts, both graders gave low error scores and showed very high agreement.

Helping computers see rare problems better
The main goal of EyeDiff is not just to create pretty pictures but to strengthen existing diagnostic models that struggle with rare findings. In many real world datasets, some disease types are represented by only a handful of cases, which can bias a model toward common conditions. The authors added EyeDiff generated images to these under represented groups in eleven separate datasets drawn from different countries and imaging devices. They then retrained several leading foundation models for eye diagnosis, including systems specialized for single scan types and others that combine images and text. Across tasks such as diabetic retinopathy grading, glaucoma staging, multiple disease classification and rare disease recognition, adding synthetic images consistently improved key performance measures compared with using real data alone or simple re sampling tricks.
Benefits and safeguards for clinical use
EyeDiff showed particular value for specific rare diseases such as Stargardt disease, retinopathy of prematurity and retinoblastoma, where boosting the number of training examples led to sizable gains in detection accuracy. The authors note that all generated images were used without cherry picking, yet still delivered benefits, suggesting that the method is robust in practice. At the same time, they stress the need for caution. Synthetic images can contain subtle artifacts or reflect biases in the training data, so they should be clearly labeled, carefully monitored and protected against misuse. Expanding the diversity of source data and designing tools to spot or quantify artifacts are important next steps.
What this means for future eye care
In simple terms, EyeDiff acts as a smart image factory that can quickly supply realistic examples of both common and very rare eye diseases on demand. By filling in the gaps where real patient data are scarce, it helps diagnostic algorithms become more sensitive and balanced without exposing additional private information. While further work is needed to improve image fidelity and ensure safe deployment, this study shows that text driven synthetic imaging could become a powerful ally in building reliable tools for early detection of sight threatening retinal disease.
Citation: Chen, R., Zhang, W., Liu, B. et al. Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework. npj Digit. Med. 9, 371 (2026). https://doi.org/10.1038/s41746-026-02560-2
Keywords: retinal imaging, generative AI, rare eye disease, medical data augmentation, ophthalmology