Clear Sky Science · en

Regularized regression models for accurate prediction of HIV progression in ART patients: a comparative study

· Back to index

Why this study matters for people living with HIV

For many people on antiretroviral therapy (ART), a pressing question is how their health is likely to change over time. Doctors have access to rich information—age, body measurements, blood tests, and social circumstances—but turning all of that into a reliable forecast of who will stay well and who may progress to severe illness is not straightforward. This study tests advanced statistical tools to see which ones best predict HIV progression, with the goal of helping clinicians focus on the patients who most need extra attention.

Figure 1
Figure 1.

Following patients over time

The researchers analyzed records from 482 adults with HIV who started ART at a teaching hospital in Osun State, Nigeria, between 2020 and 2023. They tracked how long it took for patients to move into the World Health Organization’s more serious HIV disease stages (III or IV) after beginning treatment. Alongside this, they examined a broad set of information: age, sex, body mass index, height and weight, viral load in the blood, education level, marital and job status, and where patients lived. Because the exact date of HIV infection is usually unknown, the study measured survival time from the first day of ART, a standard approach in this kind of research.

When many risk factors overlap

Modern HIV care generates lots of overlapping information. For instance, weight, height, and body mass index are closely linked; if they are all used at once, standard statistical methods can become unstable and give misleading results. The team confirmed this problem—known as strong dependence between variables—by calculating variance inflation factors, which showed that some measurements were heavily intertwined. This can make it hard to tell which factors truly matter and can cause models that appear accurate in one group of patients to fail in another.

New tools to tame complex data

To overcome these issues, the study compared four “regularized” regression methods—Ridge, LASSO, Adaptive LASSO, and Elastic Net. These techniques deliberately shrink the influence of less important variables, and some can even drop them entirely, helping the model stay stable when predictors are highly related. The researchers first tested what happened when they removed the most overlapping variable (weight) and then when they left all variables in. They judged each model using several measures: how well it ranked patients by risk, how accurate its probability forecasts were, and how well it balanced goodness of fit with simplicity.

Figure 2
Figure 2.

What the models revealed about risk

Across the different methods, a consistent picture emerged about which factors were tied to a higher chance of progressing to advanced disease. Older age and higher viral load tended to be linked with worse outcomes, while being male, having more education, being employed, and having a healthier body mass index pointed toward better survival. Some models also highlighted body size measures and marital status, though the exact role of these factors depended on how the overlapping variables were handled. Importantly, the regularized approaches greatly reduced the instability that had appeared in a standard survival model, confirming that they can provide clearer, more trustworthy estimates in complex HIV data.

Choosing the right tool for the job

The study showed that no single method is best for every goal. When the main aim was to pick out the few most important predictors and keep the model easy to interpret, Adaptive LASSO performed best after trimming away the most overlapping variable. It gave the most accurate and well-calibrated forecasts while highlighting key risk factors. However, when all variables—including highly related ones—were kept in the model, Elastic Net delivered the strongest predictions overall and remained stable in the face of heavy overlap. Ridge regression also did well at preserving prediction accuracy while keeping all predictors. In everyday terms, these results suggest doctors and health planners can use different tools depending on whether they most need a clear list of risk drivers or the most powerful possible forecast of future HIV progression for patients on ART.

Citation: Owoade, G.O., Okewole, D.M., Nziku, C.K. et al. Regularized regression models for accurate prediction of HIV progression in ART patients: a comparative study. Sci Rep 16, 10251 (2026). https://doi.org/10.1038/s41598-026-41445-y

Keywords: HIV progression, antiretroviral therapy, survival prediction, regularized regression, Elastic Net