Clear Sky Science · en

Machine learning-based prediction and identification of determinants of teenage pregnancy in ten East African countries

· Back to index

Why This Study Matters

Teenage pregnancy shapes the health, education, and future earnings of millions of young people, especially in low- and middle-income countries. In East Africa, girls are far more likely to become pregnant in their teens than in many other parts of the world, with consequences that ripple through families and communities. This study asks a timely question: can modern computer techniques—specifically machine learning—help us spot which girls are most at risk, and which social and economic conditions matter most, so that limited resources can be directed where they will do the most good?

Figure 1
Figure 1.

Taking a Fresh Look with Smart Computers

The researchers analyzed data from more than 32,000 girls aged 15 to 19 across ten East African countries, using large, standardized health surveys that already guide many public health decisions. Instead of relying only on traditional statistics, they turned to supervised machine learning, a family of methods that learn patterns from examples. Several models were tested, including logistic regression, decision trees, and more advanced tools such as Random Forests and XGBoost. Before training these models, the team carefully cleaned and prepared the data: they filled in missing values, converted survey answers into computer-friendly formats, scaled numerical values so no single factor dominated, and engineered new variables such as combined access to radio, TV, and newspapers as a single measure of social media exposure.

Balancing the Data and Training the Models

One challenge was that most surveyed teenagers had not been pregnant, creating an imbalance between “pregnant” and “not pregnant” cases that can mislead computers. To address this, the team used techniques that both remove borderline duplicate examples and generate realistic additional cases for the smaller group, producing a more even and informative dataset. They then split the data so that 80% was used to teach the models and 20% was kept back to test how well the models would perform on new, unseen girls. Across multiple evaluation measures—such as overall correctness, how often the model caught true cases, and how well it avoided false alarms—the Random Forest model stood out as the most reliable.

Figure 2
Figure 2.

What Drives Teenage Pregnancy Risk

With a strong-performing model in hand, the authors focused on interpretability: which factors were the most influential in predicting teenage pregnancy? Using feature selection and an explanation tool called SHAP, they consistently found a core set of social and economic conditions. These included being unmarried, starting sexual activity at a younger age, low levels of maternal education, living in poorer households, larger family size, living in rural areas, and reporting that distance to a health facility was a big problem. Limited exposure to social media and digital information sources also appeared to raise risk. By contrast, current use of modern family planning methods was linked to a lower chance of teenage pregnancy, suggesting that access to and acceptance of contraception can be protective.

Differences Across Countries and Model Strength

The data revealed that teenage pregnancy is not evenly distributed across East Africa. Kenya showed the highest rate, at about one in five teenage girls, whereas Malawi had the lowest rate in this dataset. Still, the same broad risk factors appeared across the region. The Random Forest model captured these patterns with high accuracy (close to 90%) and a strong ability to distinguish high-risk from low-risk teens. Because the model was repeatedly tested on different subsets of the data, the authors argue that its performance is likely to hold up in similar real-world settings, even though the analysis cannot prove cause-and-effect relationships.

What This Means for Girls and Communities

Put simply, the study concludes that teenage pregnancy in East Africa is closely tied to poverty, limited schooling, early sexual debut, rural residence, poor access to health services, and lack of information through modern media—while modern contraception helps lower risk. By showing that computer models can reliably flag these patterns in large national surveys, the work suggests a practical path forward: governments and health organizations could use similar tools to identify where teenage girls are most vulnerable, expand youth-friendly reproductive health services in rural areas, strengthen school-based education on sexual health, and harness radio, TV, and mobile media to share accurate, stigma-free information. Together, these steps could help more adolescents avoid unintended pregnancies and keep control over their health and futures.

Citation: Baykemagn, N.D., Gebiru, A.M., Getnet, M. et al. Machine learning-based prediction and identification of determinants of teenage pregnancy in ten East African countries. Sci Rep 16, 13128 (2026). https://doi.org/10.1038/s41598-026-43004-x

Keywords: teenage pregnancy, East Africa, machine learning, reproductive health, social determinants