Clear Sky Science · en
Advancing data science research education in Africa through datathon-driven innovations
Why this matters for health in Africa
Across Africa, researchers are collecting huge amounts of health information, from clinic visits to mosquito counts and satellite images. Yet without people trained to make sense of these data, many life‑saving insights remain locked away. This paper describes a new way to quickly train young scientists across West Africa using “datathons” – intensive, team‑based events where participants analyze real malaria data and turn their work into publishable research. The approach shows how short, focused programs can boost local expertise and help ensure African data are used to solve African health problems.
A new kind of learning event
The authors designed a two‑phase training model under the Data Science for Health Discovery and Innovation in Africa initiative. First came a hybrid “foundation week,” open to about 50 participants from 14 countries, many joining remotely. During this phase, trainees learned coding, data management, and basic analytical skills using free or widely available tools such as R, Python, and mapping software. The focus was hands‑on practice rather than lectures, with exercises that walked participants step‑by‑step through real analytical tasks. Those who completed most of the sessions earned a certificate and became eligible for the second, in‑person phase.

Turning statistics into smarter tools
Most attendees already knew some traditional statistics, so the instructors used that familiarity as a bridge into newer methods often grouped under artificial intelligence and machine learning. Instead of treating these as mysterious “black boxes,” the training showed how they grow out of familiar ideas. For example, one case study compared ordinary linear regression – a staple of statistics – to a machine‑learning style regression that splits data into training and test sets and uses cross‑validation to check performance. Another exercise compared manually tracing objects on satellite images with automated image‑classification methods that can pick out house roofs far more quickly and accurately. These side‑by‑side demonstrations helped participants see when to use classic techniques and when machine learning adds real value.
Inside the datathon
After the foundation phase, 15 trainees traveled to Bamako, Mali, for a five‑day in‑person datathon held at a specialized bioinformatics center. They worked with a rich malaria data warehouse built from a long‑running study in Mali, Senegal, and The Gambia that had tracked thousands of people, households, mosquitoes, and clinic visits over several years. Participants were placed into five small teams mixing skills in programming, epidemiology, and clinical work. Guided by mentors, each group chose its own research question – such as why some children carry the malaria parasite without symptoms, or how malaria risk shifts over seasons and locations – and then cleaned, linked, and analyzed the relevant data layers.

From intense week to lasting impact
Throughout the week, teams presented daily progress to judges who scored projects on scientific quality and methods. At the end, groups delivered final talks and written reports, and prizes recognized top performances. Crucially, the datathon did not end when the event did. Each team was paired with a senior mentor and joined a rotating schedule of online meetings to turn its project into a full scientific paper within about a year. The program also highlighted real‑world challenges: coordinating international travel, coping with language barriers between English and French speakers, and giving women scientists equal chances to apply, attend, and lead teams. Despite these hurdles, participants reported high engagement and enjoyment, and the authors note strong networking and collaboration across countries.
What this means for the future
This study shows that carefully planned datathons can do much more than offer a brief coding crash course. By combining structured preparation, access to high‑quality local health data, and sustained mentoring, the model helps young African researchers learn by doing real science that matters for their communities. The authors argue that similar programs could be adapted to other diseases and regions, especially where universities or hospitals already have basic computing facilities. In the long run, such efforts can turn underused data into evidence for better health policies, while building a new generation of data‑savvy scientists across the continent.
Citation: Doumbia, S., Kane, F., Diabate, O. et al. Advancing data science research education in Africa through datathon-driven innovations. Sci Rep 16, 11527 (2026). https://doi.org/10.1038/s41598-026-41474-7
Keywords: datathon training, health data science, malaria research, Africa capacity building, machine learning education