Clear Sky Science · en
Forecasting tomato production in major Asian producers: a comparative study of ARIMA, exponential smoothing, score-driven models, and XGBoost
Why future tomato harvests matter
Tomatoes are a staple in kitchens across Asia, filling everything from street food to sauces on supermarket shelves. Behind each tomato, however, lies a chain of farmers, markets, and policy decisions that can be shaken by bad weather or price swings. This study asks a simple but vital question: how can we better forecast future tomato harvests in key Asian countries so that farmers, governments, and consumers are not caught off guard?
Tomatoes, farmers, and food on the table
Tomatoes are one of the world’s most produced and traded crops, second only to potatoes. They are rich in vitamins and antioxidants and support a large processing industry that turns them into juice, soup, paste, and sauces. In Asia, China and South Asian countries together supply a major share of global tomatoes, and millions of small farmers depend on this crop for their income. When harvests swing up and down, farm earnings and consumer prices swing with them, which can threaten both rural livelihoods and urban food security.
Rising harvests and uneven growth
Looking back over sixty years of data, the researchers tracked annual tomato production from 1961 to 2021 in Bangladesh, China, India, Pakistan, and Sri Lanka. China showed a dramatic climb, especially after 2000, growing from a modest producer to a giant that now yields tens of millions of tonnes each year. India also expanded strongly, though at a steadier pace. In contrast, Bangladesh, Pakistan, and Sri Lanka showed gentler growth with occasional bumps and dips. These differences matter: countries with fast structural growth need models that can follow long-term upward trends, while others need tools that can capture stability or only slow change.

Putting forecasting tools to the test
To see which methods best predict future production, the team compared four types of forecasting tools. Two come from classical statistics, which look for smooth patterns and trends over time. Another, called a score-driven approach, lets key parts of the model shift gradually as new data arrive, so it can adapt to changing conditions. The fourth tool, known as XGBoost, is a popular machine learning method that combines many decision trees to capture complex and irregular behavior in the data. The researchers trained all four methods using data from 1961 to 2014, then tested how well they predicted the years 2015 to 2021.
Short-term skill versus long-term realism
In these tests, XGBoost often produced the lowest errors, especially in years when production was volatile. That means it was very good at matching recent ups and downs. However, the authors highlight an important drawback: tree-based machine learning models struggle to project values beyond what they have already seen. For crops like tomatoes in India and Bangladesh, where production has been climbing for decades, this weakness shows up as unrealistically flat future lines. Classical methods, by contrast, are built to extend underlying trends forward in time, even if they are less flashy in the short term.

Choosing the right tool for each country
Taking both accuracy and realism into account, the authors made a careful choice of one “practical” model for each nation. For Bangladesh, they selected an exponential smoothing method, which gently extends the long-run upward trend while still reacting to recent shocks. For China, India, Pakistan, and Sri Lanka, they chose score-driven models, which can track shifting growth rates without hitting the ceiling that constrains tree-based systems. Using these models, they forecast tomato production out to 2028. The results suggest that Bangladesh, China, India, and Pakistan are likely to see continued growth in tomato harvests, while Sri Lanka’s output is expected to level off with only slight increases.
What this means for food planning
For non-specialists, the main message is that better tomato forecasts can support smarter choices across the food chain. By showing that some advanced machine learning tools excel at short-term volatility but falter when asked to extend long-term trends, the study warns against relying on a single type of model. Its country-by-country forecasts provide a clearer picture of how tomato supply may evolve through 2028, helping planners prepare for rising demand, farmers to time investments, and markets to cushion against price spikes.
Citation: Al khatib, A.M.G., Alshaib, B.M., Mishra, P. et al. Forecasting tomato production in major Asian producers: a comparative study of ARIMA, exponential smoothing, score-driven models, and XGBoost. Sci Rep 16, 15722 (2026). https://doi.org/10.1038/s41598-026-46110-y
Keywords: tomato production, crop forecasting, time series models, machine learning agriculture, Asian food security