Clear Sky Science · en

Comprehensive performance assessment of the BMIA-12 a system for bone marrow cell quantification in normal and hematological malignancy samples

· Back to index

Why Counting Bone Marrow Cells Matters

When doctors diagnose blood cancers such as leukemia or multiple myeloma, they look closely at bone marrow smears under a microscope and count thousands of cells by hand. This slow, painstaking work affects life‑changing decisions about diagnosis, treatment, and prognosis. The paper introduces and rigorously tests a new artificial‑intelligence system, BMIA‑12A, designed to automate much of this counting—potentially making results faster, more consistent, and less dependent on the individual expert reading the slide.

Figure 1
Figure 1.

A New Digital Helper for the Microscope

The BMIA‑12A system takes digitized images of bone marrow smears and uses deep‑learning algorithms to recognize and categorize cells into 16 major types, including early “blast” cells that help define leukemias and plasma cells that are central in multiple myeloma. In this study, researchers analyzed 298 bone marrow smears from 149 people, spanning normal samples, plasma cell disorders, and several forms of acute leukemia. For each smear, they compared three approaches: fully automated AI counts, AI counts reviewed and corrected by specialists, and traditional manual counting with a light microscope. They also examined two common slide preparation techniques, called wedge and squash smears, to see how slide quality influences AI performance.

How Well the System Recognizes Normal Cells

In bone marrow from people without malignancy, the AI system performed impressively. It correctly classified about 95% of nearly 38,000 cells in both wedge and squash preparations, with 14 of 16 cell types showing recall above 90%. Wedge slides—where the sample is smoothly spread across the glass—gave slightly better precision for key diagnostic cells such as plasma cells, blasts, and rare basophils. Most of the AI’s mistakes occurred between cell types that look very similar, such as neighboring stages of white‑blood‑cell maturation or reactive lymphocytes that resemble blasts. When researchers compared how often each cell type appeared across whole samples, AI and expert‑reviewed results matched closely, while traditional manual counts were noticeably more variable, reflecting the subjectivity and limited sampling of human counting.

Figure 2
Figure 2.

What Happens in Myeloma and Leukemia

The system’s performance in disease states was more mixed. In plasma cell disorders, the AI was very precise at identifying plasma cells but missed roughly a quarter of them, especially in multiple myeloma where the marrow is packed with abnormal plasma cells that differ in shape from the textbook examples used for training. As a result, the AI tended to underestimate plasma‑cell percentages compared with manual and expert‑corrected counts, particularly when the tumor burden was high. A similar pattern appeared in acute leukemias: the AI was quite good at spotting blasts overall, especially on wedge slides, but it often assigned atypical blasts to look‑alike categories such as monocytes or early myeloid cells. Manual counts consistently produced higher blast percentages than either automated or expert‑reviewed digital results, with the biggest gaps seen in certain genetic subtypes like AML with NPM1 mutation and B‑cell ALL with the BCR::ABL1 fusion, where blast morphology is especially unusual.

Why Slide Preparation and Genetics Matter

The study showed that how the smear is made and the underlying genetics of the disease both shape AI performance. Squash smears, where marrow fragments are gently compressed between slides, introduced distortions that blurred fine nuclear details, increasing confusion between adjacent maturation stages and between blasts and other young cells. Wedge smears preserved structure better, yielding higher recall and precision, so the authors recommend them as the standard format for AI‑assisted analysis. On the biological side, blasts from specific genetic subtypes often have distinctive, sometimes distorted nuclear shapes or other atypical features. Because current AI systems are usually trained mainly on normal cells, these neoplastic variants may be forced into the “closest” normal category, leading to systematic underestimation of disease burden in precisely the patients for whom accurate thresholds matter most.

How This Changes the Lab Today

Taken together, the findings suggest that BMIA‑12A is already reliable enough to serve as a powerful screening and triage tool, especially for normal bone marrow samples and routine differential counts. It can rapidly examine tens of thousands of cells per slide and delivers stable, reproducible results that align well with expert review. However, the clear and sometimes large discrepancies with manual counts in leukemias and plasma cell cancers show that human specialists remain essential for final interpretation, particularly near diagnostic cutoffs and in genetically defined high‑risk subtypes. The authors argue that laboratories adopting such AI tools must validate them carefully for their own slide preparation methods and build workflows where AI provides an objective baseline that experts refine, rather than a replacement for expert judgment.

Citation: Kim, H.N., Lee, J.H., Jung, Y. et al. Comprehensive performance assessment of the BMIA-12 a system for bone marrow cell quantification in normal and hematological malignancy samples. Sci Rep 16, 8798 (2026). https://doi.org/10.1038/s41598-026-39443-1

Keywords: artificial intelligence in hematology, bone marrow cytology, leukemia diagnosis, multiple myeloma, digital microscopy