Clear Sky Science · en

Hypergraph-based contrastive embedding and attention fusion for detection of skin cancer

· Back to index

Why smarter skin checks matter

Skin cancer is one of the most common cancers, and melanoma, though relatively rare, is especially deadly if caught late. Doctors can use magnified photos of moles and spots, called dermoscopic images, to look for trouble, but many lesions look confusingly alike. Some dangerous cancers are rare in real life and therefore scarce in training data for artificial intelligence systems. This paper introduces a new computer vision framework called C2G‑HFMTA that is designed to spot skin cancers more reliably, especially the uncommon but critical cases, while also providing explanations a clinician can interpret.

Figure 1
Figure 1.

Balancing common and rare skin spots

A major obstacle in automated skin cancer screening is imbalance: some benign lesions appear thousands of times in datasets, while serious cancers or unusual lesions may appear only a few dozen times. Standard deep learning models tend to focus on the majority and quietly ignore the rare classes, exactly the opposite of what doctors want. The authors tackle this by first reorganizing the large HAM10000 dermoscopy dataset, which contains more than ten thousand images across seven types of skin lesions. Their strategy, called Clustered Class‑Based Segmentation, groups images into three clusters—very common, moderately common, and rare lesions—and ensures that, during training, the algorithm pays structured attention to each group instead of being overwhelmed by the majority cases.

Teaching the system how cases relate

Rather than simply feeding images into a neural network and asking it to memorize patterns, the framework builds an abstract map of relationships among images. Using a powerful feature extractor (DenseNet201), each lesion image is converted into a numerical fingerprint. These fingerprints become nodes in a graph where connections show how similar two lesions look. The authors go further and use a “hypergraph,” which can connect multiple images at once, capturing richer group patterns. On top of this structure, they apply a supervised contrastive learning scheme: images from the same diagnosis are pulled closer together in this abstract space, while images from different diagnoses are pushed apart. Crucially, this process is guided directly by the true lesion labels, not by heavy image distortions, so subtle colors and textures important for diagnosis are preserved.

Figure 2
Figure 2.

Letting meaning guide attention

The second major ingredient is an attention‑based fusion module that combines what the graph has learned with the raw visual details from the images. The graph‑derived representations, which encode how each lesion relates to others across the dataset, act like a high‑level “question” about class identity. The pixel‑level features from the original images serve as the “evidence.” Inside the multimodal attention block, these two streams interact: the semantic cues from the graph steer the model to focus its attention on regions and patterns in the image that matter most for distinguishing hard‑to‑tell‑apart lesions. Residual connections and multi‑scale processing help preserve fine details, such as slight changes in pigment, border irregularities, or small blood vessels, that often separate a dangerous lesion from a harmless one.

How well the model performs

The researchers evaluated their framework on the HAM10000 dataset using careful experimental protocols, including five‑fold cross‑validation and extensive comparisons against more than 30 popular convolutional and transformer‑based models. Their method reached about 93% overall accuracy and a similar F1‑score, far surpassing all baselines. Importantly, the gains were strongest for the rare lesion types that most systems struggle with. Additional tests showed that each component—the class‑based clustering, the hypergraph contrastive embedding, and the attention fusion—contributed measurably to performance. Visual tools such as t‑SNE, UMAP, and Grad‑CAM heatmaps revealed that the new method produces clearer clusters of lesion types and focuses attention on medically meaningful image regions, such as irregular borders in melanoma or dense keratin areas in certain precancerous lesions.

What this means for future skin checks

In plain terms, this study presents an AI framework that is both more fair and more discerning when examining skin lesions. By explicitly balancing common and rare cases, mapping out relationships among images, and letting those relationships guide where the model “looks” in each picture, C2G‑HFMTA substantially improves computer‑based diagnosis of skin cancer. While the system still needs validation on larger and more diverse clinical collections, it points toward future tools that could help dermatologists—and even home‑based screening apps—catch dangerous skin cancers earlier and with greater confidence, without losing sight of the rare cases that matter most.

Citation: Banerjee, T., Chhabra, P., Kumar, M. et al. Hypergraph-based contrastive embedding and attention fusion for detection of skin cancer. Sci Rep 16, 12808 (2026). https://doi.org/10.1038/s41598-026-43351-9

Keywords: skin cancer detection, dermoscopy AI, contrastive learning, class imbalance, medical image analysis