Clear Sky Science · en
Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion
Why rare deep-sea life is hard to spot
Far below the ocean surface, large animals that live on the seafloor help keep deep-sea ecosystems healthy. These creatures are also directly in the path of growing interest in deep-sea mining. Scientists want automated camera systems that can reliably find and count them, but there is a catch: many species are seen only a handful of times. This study explores how modern image-generating artificial intelligence can create realistic extra examples of rare species, helping detection software become more accurate without sending more ships to sea.
Taking pictures in a hard-to-reach world
The team worked with two large collections of seafloor photographs taken in a polymetallic nodule field in the western Pacific. One set came from a tethered camera system towed a few meters above the bottom, and the other from a free-swimming robot. Together, the images covered 16 types of animals, including sea cucumbers, sponges, corals, sea stars, brittle stars, and octopus. Like many wildlife datasets, the counts were highly uneven: a few common animals appeared often, while several groups had fewer than 50 labeled examples. Because most animals occupy less than a tenth of a percent of each image, and the cameras are costly to operate at depths beyond 4,000 meters, simply collecting more balanced data is not practical.

Teaching an image generator about rare seafloor life
To tackle this imbalance, the researchers turned to a popular image-generation method known as a diffusion model. They started from a powerful general-purpose version trained on everyday scenes, then gently adapted it to deep-sea imagery using a lightweight tuning method. First, they cropped out 175 clear examples of seven rare animal groups, such as bryozoans, certain corals, and octopus, and used these to train the model to draw convincing new foreground cutouts of each type. Simple text prompts were varied to encourage changes in pose, color, lighting, and viewing angle, so the model would not just copy the few original photos but instead explore realistic new combinations.
Blending synthetic animals into realistic seafloor scenes
Because object detectors need both animals and their surroundings, a second stage focused on backgrounds and layout. Here, the team used a companion control system that guides the diffusion model using simple mask images. These masks specified where and how large each synthetic animal should appear, based on size ranges seen in real data. The model then generated seafloor backgrounds with matching sediment, rock, and nodule patterns, blending the foreground animals smoothly into place while keeping lighting and color consistent. Crucially, each mask also supplied an automatic bounding box, providing ready-made labels. After filtering out flawed results, the final synthetic set contained 200 high-quality examples for each rare class, which were mixed with the original training photos.

How much did the extra images help?
The improved dataset was used to train a modern detection network that spots and labels animals in each frame. On both the towed and the free-swimming camera datasets, adding synthetic images raised the main accuracy scores compared with training on real photos alone. Gains were most striking for the rarest groups: for example, performance for octopus and bryozoans improved by more than 20 percentage points on one dataset, and similar boosts appeared for bryozoans and hydrozoans on the other. The method also fared better than standard tricks like random crops, color shifts, and cut-and-paste composites. Detailed error analysis showed that the biggest improvement came from fewer mistakes in telling species apart, rather than from sharper box placement.
Limits, trade-offs, and future directions
The benefits were not uniform. Synthetic data helped less on the blurrier, more distant images from the free-swimming robot, where even real animals are harder to see. When models trained on one camera system were tested on the other, performance dropped sharply, showing that differences in lighting and viewing distance still pose a major challenge. The authors also found that more synthetic data is not always better: performance improved up to a point and then leveled off, suggesting that once diversity is saturated, extra images mainly add redundancy. They propose future work on sharper localization, better handling of very small, fuzzy targets, and more efficient generative models that cover many species at once.
What this means for watching the deep sea
In plain terms, the study shows that carefully generated fake images can make automated systems noticeably better at finding rare deep-sea animals in real survey photos. By teaching detectors what unusual species might look like under many realistic conditions, this approach reduces missed sightings without harming performance on common animals. While it does not remove the need for real expeditions or expert checks, it offers a practical way to stretch limited data further, supporting more reliable monitoring of fragile deep-sea habitats as industrial activity moves into deeper waters.
Citation: Deng, J., Duan, M., Wei, D. et al. Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion. Sci Rep 16, 15910 (2026). https://doi.org/10.1038/s41598-026-45732-6
Keywords: deep-sea imagery, data augmentation, stable diffusion, rare species detection, underwater robotics