Clear Sky Science · en

Dental Odontogenic Lesion CBCT and Histopathology Integrated Dataset for Benchmarking Deep Learning Algorithms

· Back to index

Why jaw lesions matter to more than dentists

Strange growths in the jaw may sound like a niche dental problem, but they can mean very different things for a person’s health and treatment. Some are simple cysts that can be removed with a small procedure, while others demand extensive surgery to stop them coming back. Today, doctors rely on detailed X-ray style scans and microscope studies of tissue to tell these look-alike problems apart. This article describes a new shared data resource that could help artificial intelligence learn to make these tricky calls faster and more consistently.

Seeing the whole jaw in three dimensions

When dentists or surgeons suspect a jaw lesion, they often turn to cone-beam computed tomography, or CBCT. This scan creates a three-dimensional picture of the head that shows bones and teeth in fine detail at relatively low radiation doses. It helps surgeons plan how much tissue to remove and how to avoid nerves and tooth roots. The problem is that many common jaw lesions have very similar shapes and positions on these scans. Even experienced specialists can disagree about what they are seeing, which can affect both diagnosis and treatment choices. Automated computer systems could assist, but they need large, well-labelled image collections to learn from.

Figure 1. Paired jaw scans and tissue images feed an AI system to better sort similar-looking jaw lesions.
Figure 1. Paired jaw scans and tissue images feed an AI system to better sort similar-looking jaw lesions.

Looking at tissue under the microscope

The most reliable way to identify these jaw lesions still comes from the microscope. After surgery, the removed tissue is cut into ultra-thin slices, stained with colored dyes, and scanned into high-resolution images. Pathologists then study the cellular patterns to decide whether a lesion is a benign cyst or a tumor, and which subtype it belongs to. For example, four types dominate in the jaws: dentigerous cysts, radicular cysts, odontogenic keratocysts, and ameloblastomas. Each has its own typical appearance and behavior. However, this gold-standard answer only arrives after the operation. Patients and doctors cannot use it to refine the surgery that has already taken place, which is why better pre-surgical tools are so important.

Bringing scans and slides together

The new DOLCHID dataset tackles this gap by carefully pairing CBCT scans with matching microscope images from the very same jaw lesions. The creators collected data from 262 patients whose lesions had clear diagnoses and good-quality images from both methods. For every case, radiologists marked the exact region of the lesion on the scan, and pathologists outlined the most informative area on the stained tissue slide. The dataset is balanced across the four main lesion types and includes challenging, borderline examples to mirror real clinical complexity. All personal information was removed, and the images were stored in standard formats so that research teams around the world can work with them.

Figure 2. Combined features from 3D jaw scans and cell-level images guide AI to distinguish four jaw lesion types.
Figure 2. Combined features from 3D jaw scans and cell-level images guide AI to distinguish four jaw lesion types.

Testing how well computers learn from the data

To show that the dataset is useful, the team ran a series of tests with leading deep learning methods. First, they trained computer models to outline lesion boundaries on both CBCT and microscope images. Several different model designs all learned to find the lesions with solid accuracy, which suggests that the images and expert markings are consistent and informative. Next, they trained separate models to classify lesion type using only the scans or only the tissue images. As expected, models that saw microscope images performed very strongly, while those using CBCT alone did reasonably well given how similar the lesions can look.

What happens when both views are combined

The most forward-looking experiments used both kinds of images together. The researchers built methods that first learned separate features from CBCT scans and from tissue slides, then fused these signals into one shared representation. Even when only the scan would be available during real-life diagnosis, the training process could still benefit from what the microscope images had taught the model. These multimodal systems reached higher performance than CBCT-only models, showing that paired data can sharpen the computer’s sense of subtle differences between lesion types.

How this work can help future patients

To a layperson, the core message is that this study does not introduce a new medical device, but rather a carefully curated teaching set for smart algorithms. By linking three-dimensional jaw scans with matching microscope views and expert labels, the DOLCHID dataset gives researchers the raw material needed to build and fairly compare AI tools for jaw lesion diagnosis. Over time, such tools could help clinicians spot aggressive tumors earlier, choose the right level of surgery, and reduce uncertainty for patients facing complex dental and facial procedures.

Citation: Huang, Z., Xia, T., Wu, T. et al. Dental Odontogenic Lesion CBCT and Histopathology Integrated Dataset for Benchmarking Deep Learning Algorithms. Sci Data 13, 758 (2026). https://doi.org/10.1038/s41597-026-07112-7

Keywords: odontogenic lesions, dental imaging, cone beam CT, histopathology, deep learning dataset