Clear Sky Science · en

Benchmarking MedMNIST dataset on real quantum hardware

· Back to index

Why Quantum Computers Care About Medical Images

Hospitals generate vast collections of medical images—X-rays, scans, and microscope slides—that doctors increasingly analyze with artificial intelligence. This study asks a bold question: can today’s early quantum computers start to share that workload? The authors put a large, diverse set of medical images through real IBM quantum hardware to see how far quantum machine learning can go right now, and where it still falls short.

Figure 1
Figure 1.

Teaching Quantum Chips to See Medical Patterns

The researchers focus on quantum machine learning, where information is processed using quantum bits that can exist in multiple states at once and influence one another in ways that ordinary bits cannot. Instead of mixing quantum components with familiar deep neural networks, they deliberately use only quantum models to test their standalone capabilities. As a test bed, they adopt MedMNIST, a standardized collection of lightweight medical imaging datasets spanning chest X-rays, retinal scans, skin lesions, blood cells, colon tissue, and abdominal CT slices. Each dataset poses a different classification task, from simple yes/no questions (such as pneumonia or not) to harder multi-category problems with many classes and strongly imbalanced label distributions.

Squeezing Big Images into Small Quantum Devices

Because present-day quantum processors are small and noisy, the team cannot feed full clinical images directly into the quantum circuits. Instead, they reduce each image to a coarse grid—either 7×7 or 8×8 pixels—using average pooling, and then translate each pixel into a rotation applied to a quantum bit. This creates a compact quantum representation of the image that the circuit can work with. To make the most of limited hardware, they generate "device-aware" circuits using an automated design tool called Élivágar. It samples many candidate circuits that respect the actual wiring and error characteristics of IBM’s 127-qubit Cleveland processor, scores them for both noise resilience and ability to separate image classes, and selects the most promising layouts for further testing.

Training in Silico, Testing on a Real Quantum Chip

The quantum models are first trained in a noiseless software simulator running on powerful classical GPUs. Here, the parameters of the circuit’s rotation gates are tuned with standard optimization methods until the simulated circuit best distinguishes the training images. Once good parameter settings are found, the team freezes them and moves only the inference step onto the real IBM device. On hardware, they layer in advanced error-handling strategies: patterns of extra pulses meant to shield idle qubits from the environment, randomization tricks to average out coherent errors, and a measurement clean-up technique that statistically corrects for readout mistakes. An ablation study on one of the most noise-sensitive datasets shows that combining all three strategies markedly recovers lost accuracy and class-separation quality compared with running the same circuit bare on the device.

Figure 2
Figure 2.

How Quantum Models Stack Up Against Classical AI

Across eight MedMNIST datasets, the purely quantum models achieve solid performance despite using drastically fewer features and parameters than state-of-the-art deep networks. On chest X-rays for pneumonia detection, for example, the quantum model reaches about 85% accuracy—essentially matching popular residual networks that operate on much higher-resolution images with millions of adjustable weights. For more complex, multi-class problems such as retinal disease and skin lesion categorization, the quantum models trail the strongest classical systems but remain surprisingly competitive. When compared to lightweight classical methods trained on the same low-resolution inputs, the quantum circuits achieve similar accuracy with far fewer tunable parameters, suggesting a favorable "accuracy per parameter" trade-off for quantum designs.

What This Means for Future Medical AI

For a lay reader, the key message is that quantum computers, even in their noisy, small-scale infancy, can already tackle realistic medical imaging benchmarks in a meaningful way—though they do not yet beat the best classical AI. This work establishes a careful, apples-to-apples benchmark: a family of quantum-only models, trained in simulation and run on a 127-qubit device, evaluated across many different medical image types and rigorously compared with established classical approaches. The results show that quantum models can come close to classical performance while using far less information per image, and that smart circuit design plus error-handling techniques are crucial. As quantum hardware grows larger and cleaner, these same ideas could help push medical image analysis into a regime where quantum processors offer not just parity, but genuine advantages over today’s AI tools.

Citation: Singh, G., Jin, H. & Merz Jr., K.M. Benchmarking MedMNIST dataset on real quantum hardware. Sci Rep 16, 9017 (2026). https://doi.org/10.1038/s41598-026-35605-3

Keywords: quantum machine learning, medical imaging, MedMNIST, IBM quantum hardware, error mitigation