Clear Sky Science · en

From data to decisions: the use of explainable AI to forecast soybean yield in major producing countries

2026-01-13 · Back to index

Why smarter crop forecasts matter

From supermarket prices to global trade, the humble soybean plays a surprisingly big role in daily life. Governments, traders, and farmers all need to know how big the harvest will be months before combines roll into the fields. Today, powerful artificial intelligence (AI) tools can sift through mountains of weather and satellite data to make those forecasts—but many of these models act like “black boxes,” offering little insight into why they give a particular answer. This study explores a new kind of explainable AI that not only predicts soybean yields in the world’s main producing countries, but also shows clearly which factors drive those predictions.

Three countries that feed the world

The researchers focused on the three countries that dominate global soybean supply: the United States, Brazil, and Argentina, which together produce more than 80% of the world’s soybeans. They zoomed in to a fine scale—counties in the U.S. and equivalent small regions in Brazil and Argentina—using recent data from 2018 to 2022. For each region, they assembled a rich picture of growing conditions: detailed weather records, soil properties, and multiple kinds of satellite data tracking plant growth, water status, and even a faint glow from photosynthesis known as solar-induced chlorophyll fluorescence (SIF). In total, 154 different numerical features were extracted to describe each growing season before being fed into the models.

From data pipelines to learning machines

To handle this flood of information, the team built a standardized processing pipeline. They aligned all datasets in space and time using crop calendars, smoothed noisy satellite signals, and summarized the growing season with statistics like averages, extremes, and variability. They then trained three types of models to predict yields: Random Forest (RF), a widely used machine learning workhorse; Multilayer Perceptron (MLP), a classic deep neural network; and Kolmogorov–Arnold Networks (KAN), a newer architecture designed from the ground up to be more interpretable. To avoid fooling themselves with overly optimistic scores, the authors carefully split the data into spatial blocks so that models were tested on regions they had not “seen” during training.

Opening the black box of AI

What sets this work apart is not only the accuracy of the forecasts, but how the models explain themselves. RF and MLP were probed with standard tools that show how much each input feature matters to their predictions. KAN goes a step further: it represents the links between inputs and outputs as smooth, one-dimensional curves that can be plotted and inspected. This lets researchers literally see how, for example, a change in SIF or soil moisture nudges yield up or down. Across countries and methods, one pattern was clear—SIF, the satellite signal tied directly to photosynthesis, consistently ranked among the most important predictors of soybean yield. Other key drivers varied by region: in the United States, water-related vegetation signals stood out, while in Brazil and Argentina, temperature and soil moisture played stronger roles.

How well did the models perform?

When the researchers compared model accuracy, no single method won outright in every situation. In the United States, where yields were relatively stable year to year, Random Forest performed slightly better overall, but KAN and MLP were close behind. In Brazil, with more volatile yields and a larger dataset, all three models achieved high accuracy, though they struggled somewhat with predicting very high yields. In Argentina, where data were more limited, KAN generally outperformed the deep learning baseline (MLP) and came close to Random Forest. These results suggest that KAN can match traditional models on difficult, small agricultural datasets while offering far greater transparency about how it reaches its conclusions.

What this means for farmers and food security

For real-world decision-makers, being able to trust a model can be as important as raw accuracy. This study shows that explainable AI approaches like KAN can deliver competitive soybean yield forecasts while clearly revealing which environmental and crop signals matter most. That visibility helps scientists diagnose errors, incorporate expert agronomic knowledge, and adapt models to new regions or changing climates. In the long run, such transparent tools could be woven into national crop monitoring systems, giving farmers, planners, and markets earlier and more reliable warnings of poor harvests or bumper crops—and supporting more resilient and sustainable food systems.

Citation: Wang, X., He, Y., Chen, H. et al. From data to decisions: the use of explainable AI to forecast soybean yield in major producing countries. Sci Rep 16, 5103 (2026). https://doi.org/10.1038/s41598-026-35716-x

Keywords: soybean yield prediction, explainable AI, remote sensing, agricultural modeling, food security