Clear Sky Science · en

Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield

2026-02-13 · Back to index

Why this matters for our dinner plates

Maize (corn) and soybeans are the workhorses of U.S. agriculture, feeding people and livestock at home and abroad. As the climate becomes less predictable, farmers and scientists are racing to understand how heat waves, shifting rainfall, and soil conditions will affect harvests. This study shows how modern machine learning tools, made more transparent and interpretable, can sift through mountains of farm and environmental data to reveal which weather and landscape factors most strongly shape corn and soybean yields across major U.S. growing regions.

Looking closely at real farm fields

Instead of relying on county averages, the researchers drew on detailed “yield monitor” data collected by combines as they harvested 134 maize and soybean fields across nine U.S. states from 2007 to 2021. Each field was broken into a fine grid about the size of a small house lot, capturing how yields varied from one patch to the next. They linked every grid cell to public maps of daily weather, soil properties, and terrain features such as slope and elevation. After cleaning errors, removing outliers, and aligning everything at a common 30‑meter resolution, they assembled a large dataset describing how each tiny piece of land performed under its unique combination of conditions.

Teaching machines to predict harvests

With this rich dataset, the team tested several machine learning approaches, including modern tree‑based methods and neural networks, to see which could best predict yield from environmental inputs alone. Using automated tools to select the best models and the most informative variables, they achieved high accuracy: for corn, the final model explained about 87% of yield variation; for soybeans, about 90%. These models performed well not only overall, but also when tested separately by year and by state, suggesting that the learned relationships generalize across different seasons and regions rather than simply memorizing the training data. Spatial tests of the remaining errors showed that most broad patterns were captured, with only some fine‑scale variation left unexplained within fields.

What really drives corn and soybean yields

To open up the “black box” of machine learning, the authors used modern interpretation tools known as SHAP values and permutation importance. These techniques reveal which inputs matter most and how they push predictions up or down. For corn, weather clearly dominated: maximum daily temperatures during the growing season, sunlight, and how much rainfall varied from day to day were among the top predictors. The model indicated a sharp tipping point: when maximum daily temperatures rose above roughly 36–38 °C (about 97–100 °F), predicted corn yields began to fall steeply, echoing experimental evidence of heat stress during sensitive growth stages. In contrast, the soybean model leaned more heavily on terrain and soil features such as slope, elevation, and measures related to how well soil can store water, with early‑summer rainfall playing a supporting role. Together, these signals suggest that corn yield is especially vulnerable to heat extremes and weather swings, while soybean yield is more tightly tied to how water moves and is stored in the landscape.

From patterns to breeding and farm decisions

By pinpointing which environmental stresses hit yields hardest, this work offers practical guidance for both plant breeders and farm managers. For corn, the identified heat threshold underscores the need for varieties that can maintain grain set under brief but intense hot spells, and for management strategies such as irrigation or adjusted planting dates in regions prone to extreme temperatures. For soybeans, the strong influence of terrain and soil points toward breeding for better tolerance to drought and waterlogging, and toward field‑level decisions that work with natural water flow, such as targeted drainage or conservation practices that improve soil structure. Although the models remain correlational and cannot replace controlled experiments, they demonstrate how interpretable machine learning, combined with widely available environmental maps and on‑farm data, can reveal hidden stress points in our food system and help make U.S. crop production more resilient in a warming, less predictable climate.

Citation: Smith, H.W., Heffernan, C.J., Ashworth, A.J. et al. Harvesting insights: interpretable machine learning to understand environmental drivers of U.S. maize and soybean yield. Sci Rep 16, 8994 (2026). https://doi.org/10.1038/s41598-026-38724-z

Keywords: crop yield prediction, maize, soybean, machine learning, climate impacts