Clear Sky Science · en

Establishing dermatopathology encyclopedia DermpathNet with Artificial Intelligence-Based Workflow

· Back to index

Why a New Skin Image Library Matters

Skin cancers and other growths are often diagnosed by examining thin slices of tissue under a microscope, a field known as dermatopathology. Yet the images used to train doctors and test artificial intelligence (AI) tools are usually locked away behind paywalls or privacy rules. This paper introduces DermpathNet, a freely available, carefully reviewed collection of thousands of skin biopsy images built with the help of AI. It is designed to make learning, cross-checking diagnoses, and developing new computer tools easier and more reliable for clinicians and researchers worldwide.

Figure 1
Figure 1.

The Problem of Hidden Teaching Slides

Most medical trainees learn from glass slides or digital files controlled by a single hospital. These materials can contain patient identifiers or be licensed in ways that prevent sharing. Existing online resources either require paid subscriptions, offer only a handful of example cases, or may not be consistently reviewed by experts. As a result, students and clinicians lack a broad, trusted, open collection of microscopic skin images that show both common and rare tumors. Without such a resource, it is difficult to compare cases, standardize teaching, or fairly judge how well computer vision systems actually perform.

Finding Quality Images in a Sea of Articles

The authors turned to PubMed Central’s Open Access collection, a vast library of full-text biomedical articles whose contents can legally be reused. They began with a structured list, or lexicon, of 12 groups of benign and malignant skin tumors and nearly 200 specific diagnoses, built from expert input and standardized medical vocabularies. Using this lexicon, they queried PubMed Central for articles whose titles or abstracts mentioned these diseases, downloaded the full texts, and extracted all the figures and figure captions. This first pass yielded more than 200,000 figures from over 43,000 articles—far too many, and most were not actually microscopic images of skin.

How AI and Keywords Worked Together

To sort useful images from irrelevant ones, the team created a hybrid filtering system. One part was a deep learning model trained on a separate medical image collection to decide whether a given picture looked like a pathology slide or not. The other part scanned the figure captions for tell-tale phrases such as magnification levels or staining terms that usually accompany microscope images. For very common diagnoses, only images that passed both tests were kept, improving purity; for rare diagnoses, images that passed either test were accepted to avoid missing scarce examples. When this hybrid method was checked against a human “gold standard” of 651 manually labeled images, its performance was strong, with an F-score over 90%, better than using AI or keywords alone.

Figure 2
Figure 2.

What DermpathNet Contains and How It Is Used

After processing, the workflow produced 7,772 images covering 166 different skin tumor diagnoses. Every image was reviewed by board-certified dermatopathologists, and each is linked to rich metadata describing the source article, disease type, and standardized medical codes. The dataset is organized so users can explore by disease category, specific diagnosis, or original publication, while tracking licensing information. Beyond education, the authors used DermpathNet to probe the limits of a modern vision–language model: GPT‑4v. When asked to identify specific skin tumors in these challenging images under true/false, open-ended, and multiple-choice formats, the model performed poorly, often failing to recognize the correct diagnosis even when given a short list of options.

What This Means for Doctors and Machines

For non-specialists, DermpathNet can be thought of as a high-quality, openly shared atlas of microscopic skin tumors, built with a smart sorting system that lets human experts focus on final checks instead of manual browsing. It lowers barriers to training and comparison across institutions and exposes the difficulty of the visual task: even a cutting-edge AI system struggled on these images. The authors conclude that while AI can help assemble such resources, today’s general-purpose models are not yet ready to replace specialist judgment in dermatopathology. Instead, DermpathNet offers a solid foundation for teaching and for building the next generation of dedicated medical AI tools that can truly aid in diagnosing skin disease.

Citation: Xu, Z., Lin, M., Zhou, Y. et al. Establishing dermatopathology encyclopedia DermpathNet with Artificial Intelligence-Based Workflow. Sci Data 13, 368 (2026). https://doi.org/10.1038/s41597-026-06715-4

Keywords: dermatopathology, medical image dataset, artificial intelligence, skin cancer, digital pathology