Clear Sky Science · en
A multi-focus oral panoramic x-ray image dataset based on pixel-level annotations
Why a new look at dental X-rays matters
Many of us only think about dental X-rays when we sit in the dentist’s chair, but those shadowy images hold vital clues about gum disease, cavities, and other problems that affect billions of people worldwide. This article describes ToothPix, a new large-scale collection of dental panoramic X-ray images that has been carefully prepared to help computers learn to read these scans. By turning routine dental pictures into a rich shared resource, the work aims to make future checkups faster, more accurate, and more widely available.

The global burden lurking in our mouths
Oral diseases are among the most common health problems on the planet, from tooth decay to advanced gum disease. Dentists usually rely on a mix of visual inspection and experience to decide what they see on an X-ray, a process that can miss subtle warning signs and varies from one clinician to another. At the same time, deep learning systems have begun to match or even surpass human experts in reading some types of medical images. To bring similar progress to dentistry, researchers need large numbers of real-world X-rays that are clearly and consistently marked to show where diseased areas actually are. Until now, such datasets have been scarce, small, and often poorly structured.
Building a rich picture of the mouth
The ToothPix project set out to fill this gap by collecting 8,655 panoramic dental X-rays from patients aged 4 to 80 at a single hospital in China. A panoramic X-ray sweeps a narrow beam in an arc around the head to create a broad two-dimensional picture of all the teeth, jawbones, and surrounding structures in one shot. The images in ToothPix were captured at high resolution and kept at their original size to preserve tiny features that matter for diagnosis. They also cover a wide range of real-world imaging conditions, such as different brightness and contrast levels, so that computer models trained on them are less likely to be thrown off by variation in equipment or patient positioning.
Turning raw scans into teaching material for computers
Collecting images was only the first step. The team carefully removed all personal details stored in the files and converted the scans into widely used image formats so they can be opened and analyzed outside hospital systems without exposing private data. Next came a rigorous quality check: experts screened the images for problems like missing structures, duplicated records, or poor exposure, and scored them using a standard scale that balances clarity against radiation dose. Remarkably, all images met or exceeded the threshold for acceptable quality, so none had to be discarded. This means the final dataset offers a consistently clear view of patients’ mouths, an essential foundation for trustworthy computer analysis.
Drawing disease boundaries by hand
To teach an algorithm what to look for, the researchers needed more than raw pictures; they needed detailed maps of where disease appears. Twenty specialists in dental imaging, each with several years of clinical experience, manually traced the outlines of teeth and problem areas on every image using dedicated labeling software. These painstaking, pixel-level drawings highlight multiple common conditions on a single scan, from cavities to impacted teeth. The outlines were then converted into color-coded mask images that pair exactly with each X-ray, and the files were organized into a simple folder system so that other researchers can plug them directly into their own programs without extra cleanup.

Putting the dataset to the test
To see whether ToothPix is truly useful for artificial intelligence, the authors evaluated both the pictures and the hand-drawn labels. A five-part scoring system examined contrast between teeth and background, image sharpness, distracting artifacts, and how completely and precisely the annotations captured tooth boundaries. Across these measures, the dataset scored very close to the maximum, indicating that both images and markings are clear and reliable. The team then trained several popular image-segmentation models on ToothPix and measured how well they could automatically outline diseased regions. While performance varied by specific condition, the results showed that the dataset can support modern deep learning methods and yields promising accuracy for key tasks like identifying impacted teeth.
What this means for future dental visits
In everyday terms, ToothPix is like a well-organized library of expertly marked dental X-rays that any qualified researcher can use to teach computers how to read scans. There are still limitations—some rare diseases remain underrepresented, the images come from a single hospital, and only one type of scan is included—but the work lays a strong foundation. As similar datasets grow and expand to more clinics and imaging methods, they could help bring quicker, more consistent, and earlier detection of dental problems to patients around the world, supporting dentists rather than replacing them and making that familiar X-ray image a more powerful tool for protecting our health.
Citation: Cui, J., Gu, J., Guan, Y. et al. A multi-focus oral panoramic x-ray image dataset based on pixel-level annotations. Sci Data 13, 693 (2026). https://doi.org/10.1038/s41597-026-07021-9
Keywords: dental X-ray, medical imaging dataset, artificial intelligence in dentistry, lesion segmentation, computer-aided diagnosis