Clear Sky Science · en

LASSO–HHO two-stage hybrid gene selection framework for accurate Alzheimer’s disease diagnosis

· Back to index

Why this research matters for brain health

Alzheimer’s disease robs people of memory and independence, and we still lack simple, widely available tools to catch it early. Modern lab techniques can measure the activity of tens of thousands of genes in a tiny sample of brain tissue or blood, but that flood of data is hard to turn into clear yes-or-no answers for doctors. This paper introduces a smart, two-step way to sift through that genetic information and pick out a tiny set of genes that can diagnose Alzheimer’s disease with extremely high accuracy, while keeping the method fast and practical enough for real-world use.

Turning a haystack of genes into a handful of clues

Each Alzheimer’s gene-expression dataset used in this study contains over 20,000 genes but only a few hundred patients. That imbalance is like trying to learn a person’s identity from thousands of questions, with answers from only a small group of volunteers: it is easy to read too much into random noise. The authors tackle this by first applying a technique called LASSO, which acts like a powerful filter. It shrinks most gene signals down to zero and keeps only those that truly help distinguish people with Alzheimer’s from healthy controls. On its own, this first pass often cuts the gene list by more than 99%, dramatically reducing complexity and the chance of overfitting, while preserving enough information to predict the disease.

A second intelligent sweep when needed

After this initial pruning, the framework conditionally launches a second step based on a nature-inspired search strategy called Harris Hawks Optimization. Here, each “hawk” represents a possible subset of genes, and the hawks repeatedly adjust their positions to hunt for combinations that lead to better diagnosis. Crucially, this step is not always used. If LASSO alone already reaches at least 99% accuracy and the selected gene set is smaller than 40 genes, the process stops there. Otherwise, the hawk-based search further refines the surviving genes, guided by a scoring rule that strongly rewards high diagnostic accuracy but still prefers fewer genes. This adaptive design avoids wasting computer time when the simpler solution is already good enough.

Figure 1
Figure 1.

Putting the method to the test

The authors evaluated their framework—called LHGS—on four public Alzheimer’s datasets drawn from different brain regions and research groups. They trained a standard machine-learning classifier called a support vector machine using only the selected genes, and judged performance with common measures such as accuracy, precision, and recall. In some datasets, LASSO on its own was enough to reach perfect or near-perfect accuracy: one dataset needed only three genes to correctly separate all Alzheimer’s and healthy samples. In tougher datasets, adding the hawk-based search improved accuracy to 100% while still keeping the final set between about 11 and 37 genes. Compared with a range of other popular optimization methods, the two-stage approach was both more accurate and far faster, because the heavy search happens only in the drastically reduced space created by LASSO.

Discovering promising gene markers

Beyond building a good predictor, the study also highlights concrete genes that may be especially important in Alzheimer’s biology. By looking at how strongly each gene contributed in the LASSO step, the authors identified a short list of consistently influential genes in each dataset. Some, such as TRPM7 and genes involved in stress signaling, inflammation control, and synaptic communication, are already linked to brain health and neurodegeneration. Others are less well understood, suggesting new directions for laboratory studies. The fact that reliable diagnosis can be achieved with only a few dozen or even a few genes hints that future tests might focus on small, targeted panels rather than broad, expensive arrays.

Figure 2
Figure 2.

What this means for future Alzheimer’s diagnosis

To a layperson, the main message is that it is becoming possible to read the molecular “signature” of Alzheimer’s from a surprisingly small number of genes, chosen from tens of thousands by a careful two-step process. The LHGS framework shows that we can combine a fast statistical filter with a selective second pass to get both accuracy and speed, making the approach more suitable for eventual clinical tools. While the authors caution that their results need confirmation in larger and more varied patient groups, and that earlier experiments may have slightly overestimated performance, the work points toward blood- or tissue-based genetic tests that could flag Alzheimer’s disease early using a compact, well-chosen set of gene markers.

Citation: Asiry, O., El-Gawady, A., Eltoukhy, M.M. et al. LASSO–HHO two-stage hybrid gene selection framework for accurate Alzheimer’s disease diagnosis. Sci Rep 16, 13393 (2026). https://doi.org/10.1038/s41598-026-48742-6

Keywords: Alzheimer’s diagnosis, gene expression, feature selection, machine learning, biomarkers