Clear Sky Science · en
InfoColon: A dataset for consecutive informative frames in Colonoscopy
Why clearer colon videos matter
Colonoscopy is one of the main tools doctors use to spot early signs of colorectal cancer, but the videos it produces are often messy. Many frames are blurred, blocked by bubbles or tools, or simply show a blank wall of tissue. These unhelpful moments slow doctors down and confuse computer programs that aim to assist them. This study introduces InfoColon, a new shared collection of colonoscopy videos designed to separate useful views from useless ones and to help build smarter, more reliable medical AI systems.
Cleaning up a noisy medical video stream
During a colonoscopy, the camera moves through a twisting, moist, and moving organ. As the doctor advances and withdraws the scope, the picture can shake, fog, or fill with glare from the light. The authors point out that such uninformative frames make it harder to find polyps, increase fatigue for clinicians, and lengthen procedures for patients. They argue that being able to quickly pick out the informative frames, where the inner tunnel of the colon and its structures are clearly seen, would improve diagnosis, allow automatic quality checks, and support new tools such as 3D maps of the colon and navigation aids. Yet until now, there has been no large public dataset to train and compare such methods.

A new shared library of colon views
The researchers built InfoColon by combining real colonoscopy videos from two hospitals with several well known public image collections. From hospital examinations, they gathered more than 119,000 frames sampled once per second, and then added tens of thousands of frames from existing research datasets. Every frame was labeled by expert endoscopists as either informative or belonging to one of six uninformative types: plain wall, bubble, blurry, bad light, tool in the way, or other obstacles such as stool. Checks on a sample of frames showed strong agreement between experts, giving confidence that the labels are reliable. Alongside the videos, the team provides summary reports that show how informative frames are spread over time in each procedure.
Teaching computers to focus on what matters
Labeling such a large number of frames by hand would be costly and slow, so the team tested learning strategies that can make the most of a smaller set of labeled examples. They compared standard supervised learning with semi supervised and active learning approaches that ask experts to label only the most helpful new samples. Their new method, called Accuracy Driven Adaptive Threshold BALD, chooses frames for expert review based on how much the model’s performance is changing, rather than just how uncertain it is. Using a modern vision transformer model, they showed that this approach can reach high accuracy in telling informative from uninformative frames across several label setups, while using far fewer expert labeled images than traditional training.
From flat video frames to 3D maps
InfoColon does more than list which frames are clear. The dataset also includes camera calibration videos and parameters that correct for the wide angle distortion of the colonoscope lens. With these in hand, the authors used only informative frames to test 3D reconstruction methods that turn 2D images into a 3D point cloud of the colon’s surface. In example clips, the resulting 3D models captured important shapes such as folds, bends, and texture, and showed smooth transitions from frame to frame. This suggests that a well filtered stream of frames can support future tools that guide the scope, estimate coverage, or help spot missed areas.

What this means for patients and researchers
To a layperson, InfoColon can be seen as a carefully organized library that keeps the clear pictures and tags the useless ones, while also recording how the camera behaves. This shared resource should make it easier for researchers worldwide to build and fairly compare computer programs that clean, analyze, or reconstruct colonoscopy videos. In the long run, such progress could support doctors with better quality checks and more informative views of the colon, without changing the procedure itself for patients.
Citation: Choi, T., Moon, H.S., Jang, S. et al. InfoColon: A dataset for consecutive informative frames in Colonoscopy. Sci Data 13, 748 (2026). https://doi.org/10.1038/s41597-026-07060-2
Keywords: colonoscopy, medical imaging, video analysis, dataset, active learning