Clear Sky Science · en

Machine learning based classification of female genital mutilation in 11 Sub-Saharan African countries using demographic and health survey data

2026-02-19 · Back to index

Why this research matters

Across parts of Africa and the world, millions of girls still face female genital mutilation, a deeply harmful practice with lifelong physical and emotional consequences. Governments and communities want to stop it, but resources are limited and reliable data on where girls are most at risk can be hard to obtain. This study shows how modern pattern‑finding tools, known as machine learning, can sift through large health surveys to highlight which mothers, families, and communities are most likely to continue the practice—and where prevention efforts could save the most girls from harm.

Understanding a hidden practice

Female genital mutilation (FGM) involves intentionally injuring or removing parts of the external female genitals for non‑medical reasons. It is recognized as a violation of human rights and is linked to severe short‑term problems such as pain, heavy bleeding, infection, and even death, as well as long‑term complications like childbirth difficulties, infertility, and psychological trauma. Although many countries have laws against FGM, it remains common in parts of Sub‑Saharan Africa, where social pressure, tradition, and beliefs about religion and marriage can override official rules. To design smarter prevention programs, decision‑makers need tools that can spot patterns in who is most at risk, going beyond simple national averages.

Big data from everyday households

The researchers drew on Demographic and Health Surveys, large nationally representative studies that visit thousands of households to ask women about their lives and health. They combined recent survey data (2015–2023) from 62,249 women in 11 countries across East and West Africa. All were aged 15–49 and had at least one daughter. Each mother was asked whether any of her daughters had undergone FGM. The team also assembled information on the mother’s age, whether she lived in a rural or urban area, her education and household wealth, who headed the household, her own circumcision status, access to media, country of residence, and her attitudes and beliefs about FGM, including whether she saw it as required by her religion or thought it should continue or stop. These many pieces of information formed the raw material for computer models that would learn to distinguish families where daughters had been cut from those where they had not.

Teaching machines to recognize risk

To turn this rich but messy dataset into something a computer could learn from, the team cleaned, standardized, and encoded the answers so that both numbers and categories could be understood by algorithms. They used a technique called SMOTE to make sure the models did not simply learn to favor the larger group of families where daughters had not been cut. They then tested seven different types of classification models, including simple approaches like logistic regression and more flexible ones such as decision trees, random forests, support vector machines, k‑nearest neighbors, Naive Bayes, and XGBoost. Each model was trained on 80% of the data and evaluated on the remaining 20%, using several performance scores that measure how often the model is right, how well it avoids missing true cases, and how clearly it separates higher‑risk from lower‑risk families.

The stand‑out model and what drives its choices

Among all the tested approaches, the random forest model—a method that combines many decision trees into a single, more stable predictor—performed best. It correctly classified mothers in about 85% of cases and was especially strong at identifying those whose daughters had been subjected to FGM, with a high ability to distinguish risk levels across the population. But accuracy alone is not enough; public health officials also need to understand why the model makes its predictions. To open this black box, the authors used an interpretability method called SHAP, which assigns each factor a contribution to the model’s decision. Four elements stood out: a mother’s opinion on whether FGM should continue, the country she lives in, whether she herself has undergone FGM, and whether she believes the practice is required by religion. Mothers who supported the continuation of FGM, lived in high‑prevalence countries, were themselves circumcised, or saw FGM as religiously required were far more likely to report that their daughters had been cut.

From numbers to action

These findings translate into clear guidance for those working to end FGM. The model suggests that changing attitudes among mothers—especially those who have been cut themselves and those who feel religious pressure to continue the practice—could have a powerful effect on protecting girls. It also highlights that risk differs sharply between countries, underscoring the need for tailored, country‑specific strategies rather than one‑size‑fits‑all campaigns. While the authors caution that their cross‑sectional data cannot prove cause and effect, and that any risk classifications must be used carefully to avoid stigmatizing communities, their work shows how machine learning can help pinpoint where education, community engagement, and faith‑based outreach are most urgently needed. In this way, advanced data tools may become quiet but important allies in the global effort to end FGM and safeguard the health and rights of girls.

Citation: Gebrehana, A.K., Demoze, L., Yitageasu, G. et al. Machine learning based classification of female genital mutilation in 11 Sub-Saharan African countries using demographic and health survey data. Sci Rep 16, 9944 (2026). https://doi.org/10.1038/s41598-026-40723-z

Keywords: female genital mutilation, machine learning, Sub-Saharan Africa, public health data, women’s rights