Clear Sky Science · en

Hybrid models of sparse and robust regression to solve heterogeneity problem in black pepper big data

· Back to index

Why your pepper and food quality depend on smart drying

Anyone who cooks knows that a good spice can make or break a meal. But few people realize how much careful drying is needed to keep those flavors, aromas and healthful compounds intact—especially for black pepper, the "King of Spices." This article looks at how researchers used advanced data techniques to fine‑tune solar drying of black pepper so that farmers can save energy, cut waste and consistently deliver high‑quality spice in a world that increasingly relies on smart, sensor‑driven agriculture.

From sun‑dried spice to smart solar dryers

Traditionally, peppercorns are spread in the sun until they darken and dry, a slow process that exposes them to dust, insects and uneven heating. Modern mechanical dryers speed things up but often burn fossil fuels and demand significant labor. Solar dryers offer a cleaner middle ground: they harness the sun while enclosing the crop in a more controlled chamber. In this study, a modified hybrid solar dryer in Malaysia, equipped with sensors and Internet‑of‑Things style monitoring, was used to dry black pepper. The goal was to understand which conditions—such as temperature, humidity and solar radiation—most strongly control how quickly and evenly moisture leaves the peppercorns, because that moisture level determines shelf life, safety and flavor.

When big farm data gets messy

With nearly two thousand drying observations and hundreds of measured and combined (interaction) variables, the research team faced a common challenge in smart farming: messy, "heterogeneous" data. Different sensors, units and conditions produced measurements that varied widely, sometimes in conflicting ways. On top of that, many variables overlapped in meaning (for example, several temperatures that rise and fall together), a problem known as multicollinearity. Occasional bad readings or unusual weather created outliers—points that sit far from the rest of the data and can easily mislead standard analysis. If not handled carefully, all of this complexity can lead to biased models that predict the wrong drying times and misguide farmers.

Figure 1
Figure 1.

Blending two kinds of models to tame the noise

To make sense of this tangled information, the authors combined two families of statistical tools. First, they used "sparse" methods—Ridge, LASSO and Elastic Net regression—that are designed for situations with many overlapping predictors. These methods gently shrink or completely drop less important variables, effectively asking the data which factors really matter for moisture removal. They did this for sets of the top 25, 35, 45, 55 and 100 most influential variables. Second, they paired these sparse models with "robust" regression techniques that down‑weight outliers so that a handful of odd readings cannot dominate the results. This hybrid approach allowed them to both select key parameters and shield their predictions from bad data points.

What the models revealed about pepper drying

Using measures of model quality, such as how much variation in moisture loss could be explained and how large the typical prediction errors were, the researchers compared many combinations of methods. Before cleaning away the most troublesome heterogeneous parameters, the Elastic Net model came out on top among the sparse approaches, capturing over 80% of the variation in moisture removal and keeping forecast errors in a range considered good for practical use. When they looked at the paired hybrid models that also included robust estimators, a Ridge‑based model combined with a particular robust method (called M Bi‑Square) was best at spotting and neutralizing outliers, even eliminating them completely under a stricter “3‑sigma” rule. Interestingly, when parameters linked to strong heterogeneity were removed, a different model—LASSO paired with an S‑type robust estimator—became the most accurate and stable, achieving similar predictive power with fewer variables.

Figure 2
Figure 2.

What this means for farmers and food lovers

For non‑specialists, the key message is that better math can lead directly to better food. By carefully filtering and stabilizing big streams of sensor data, the hybrid models in this study help pinpoint the most important knobs to turn in a solar dryer—such as particular temperature and humidity combinations—to achieve faster, more even moisture removal without sacrificing quality. The work also shows that removing too much natural variation can sometimes hurt prediction, so it is crucial to balance simplification with realism. In practical terms, these tools can guide smarter designs and control strategies for IoT‑based solar drying systems, helping pepper growers reduce losses, save energy and deliver more consistent, high‑quality spice to markets and kitchens around the world.

Citation: Kumar, P.R., Ibidoja, O.J., Ali, M.K.M. et al. Hybrid models of sparse and robust regression to solve heterogeneity problem in black pepper big data. Sci Rep 16, 11292 (2026). https://doi.org/10.1038/s41598-026-39290-0

Keywords: smart farming, solar drying, black pepper, robust regression, agricultural big data