Clear Sky Science · en
Optimization of the extraction process of Sanhuang Qingre Formula by integrating response surface methodology, grey correlation analysis, and machine learning
Better Medicine from Ancient Herbs
Many people rely on traditional herbal remedies, but one persistent question remains: how can we make these age-old formulas as stable, effective, and consistent as modern drugs? This study tackles that question for Sanhuang Qingre Formula, a traditional Chinese prescription used to treat chronic and allergic sinusitis, by using advanced data tools and machine learning to fine-tune how its medicinal ingredients are extracted.

An Herbal Remedy with Modern Problems
Sanhuang Qingre Formula combines several herbs, including coptis, skullcap, astragalus, poria, and others, to reduce inflammation, fight microbes, and support tissue repair in people with long-lasting sinus problems. For years it has been used as a hospital-made nasal drop, but this liquid form does not stay long in the nose and is not very stable, limiting its wider use. To improve the medicine and possibly develop new dosage forms, the researchers first focused on a crucial but often overlooked step: the extraction process that pulls active substances out of the raw herbs. A more efficient, well-controlled extraction means each batch of medicine can deliver a reliable dose of its helpful components.
Measuring Many Ingredients at Once
Unlike simple drugs that contain a single active molecule, this formula works through a whole group of compounds acting together. The team selected 11 key substances known to have antibacterial, antiviral, antioxidant, or anti-inflammatory effects, along with the overall extraction yield. Instead of judging success by just one compound, they created a single “comprehensive score” that blends all 12 indicators. To do this fairly, they combined expert knowledge (which ingredients matter most clinically) with objective statistics (which measurements vary the most and carry more information). This hybrid weighting approach allowed them to evaluate each extraction test in a balanced and scientifically transparent way.
Testing Conditions with Smart Experimental Design
The researchers then explored how three main factors—ethanol strength, heating time under reflux, and the ratio of liquid to herb—affected the comprehensive score. Rather than changing one factor at a time blindly, they used a structured experiment called a Box–Behnken design, which systematically varies all three and captures interactions between them. Statistical modeling (response surface methodology) revealed that ethanol concentration and extraction time had the largest influence, with the liquid–solid ratio playing a subtler role. From this analysis, the best conditions were predicted to be extraction with 55% ethanol, for 2 hours per cycle, at a liquid–solid ratio of 12 mL per gram of herb.
Letting Algorithms Hunt for the Sweet Spot
To go beyond traditional statistics, the team also applied two machine learning models—a neural network refined by a genetic algorithm and a support vector machine—alongside a method called grey correlation analysis, which compares how closely each test run approaches an ideal pattern. Grey correlation suggested one good parameter combination, but it could only choose among the conditions already tested. The support vector machine, in contrast, learned the underlying relationships well enough to predict new combinations with high accuracy, outperforming the neural network. Strikingly, its recommended optimal conditions matched the response surface model almost exactly: 55% ethanol, 2 hours of reflux, and a 12 mL/g liquid–solid ratio.

More Medicine from the Same Herbs
When the scientists actually ran the extraction under these optimized conditions and measured the chemistry, the results were clear. The amounts of all 11 target ingredients increased compared with the original water-based process, and their combined total more than doubled. Statistical tools that compare overall chemical profiles (cluster analysis and principal component analysis) showed that the optimized batches formed a distinct, tightly grouped cluster, separate from the original process and from the grey-correlation-based scheme. In simple terms, the new method pulls out more of what matters, and does so consistently from batch to batch.
What This Means for Future Herbal Treatments
For non-specialists, the takeaway is straightforward: by pairing smart experimental design with modern machine learning, the researchers turned a traditional sinus remedy into a more potent and reliable extract without changing the herbs themselves. Their optimized process uses 55% ethanol, two extraction cycles of two hours each, and a specific liquid–to–solid ratio to capture much higher levels of proven active components. Beyond this one formula, the study offers a blueprint for upgrading other complex herbal medicines so that they can be manufactured with the same attention to quality and reproducibility expected of conventional pharmaceuticals.
Citation: Chen, Q., Meng, P., Hu, X. et al. Optimization of the extraction process of Sanhuang Qingre Formula by integrating response surface methodology, grey correlation analysis, and machine learning. Sci Rep 16, 6767 (2026). https://doi.org/10.1038/s41598-026-37751-0
Keywords: traditional Chinese medicine, herbal extraction, machine learning, sinusitis treatment, process optimization