Clear Sky Science · en
A hybrid actor–critic and BERT framework for intelligent course recommendation in IoT-aware e-learning systems
Finding the Right Online Course
As online education platforms explode with thousands of classes, many learners face a simple but frustrating problem: which course should I take next? This paper tackles that overload by designing an intelligent recommendation system that watches how people actually study across phones, tablets, and computers, and then suggests courses that better fit their goals, abilities, and habits over time.

Why Online Learning Needs Smarter Guides
Massive Open Online Courses, or MOOCs, let anyone, anywhere, access high-quality lessons. But the success of this model has created a new challenge: with so many options, it is easy to feel lost. Traditional recommendation methods, which rely mostly on star ratings or simple similarity between users, struggle in this fast-changing environment. They assume that your tastes stay fixed and often ignore rich signals such as how long you stay in a session, which device you use, or when you tend to drop out. In today’s connected learning platforms, these patterns are constantly recorded and can reveal much more about what will keep a learner engaged.
Bringing Together What Courses Say and What Learners Do
The authors propose a hybrid system that combines two kinds of information: the meaning of course content and detailed traces of learner behavior. First, they use a powerful language model called BERT to read course titles, descriptions, and tags, turning them into dense numerical fingerprints that capture subtle differences in topic and style. At the same time, the system gathers interaction signals from web and mobile logs—how often a learner clicks, how long they watch videos, how quickly they move through materials, and how challenging they find different classes. These traces stand in for an Internet-of-Things learning setting, where many connected devices contribute to a picture of each person’s study habits.
How the Learning Assistant Teaches Itself
At the heart of the framework is a reinforcement learning setup, where the recommender behaves like an agent that learns by trial and error. An “actor–critic” pair of networks chooses which courses to suggest and judges how good those choices were, gradually improving its strategy. The state fed to this agent blends the BERT-based course fingerprints, compact summaries of learner behavior, and extra features produced by a Mahalanobis distance module, which measures similarity while accounting for correlations among many features. Instead of chasing quick clicks, the reward signal encourages deeper outcomes: finishing more of a course, doing better on quizzes, and spending meaningful time engaged with the material. A training method called Proximal Policy Optimization keeps learning stable even as the system explores new recommendations.

Testing on Real-World Course Platforms
To see whether this design works in practice, the authors trained and evaluated their model on three large course collections: MOOCCube, edX, and NTHU MOOCs. These datasets differ in size, subject mix, and how sparse or dense user interactions are, making them a good stress test. They compared their system with several strong competitors, including methods based on graph neural networks, clustering, and deep hybrid architectures. Across all datasets and standard measures of ranking quality, the new model consistently performed better, typically improving key scores by two to four percentage points. Careful ablation studies showed that each element—semantic text encoding, the actor–critic structure, the PPO training rule, and the correlation-aware distance measure—contributed to the final gains.
What This Means for Future Online Study
In plain terms, the study shows that a recommendation engine that truly listens to both what courses promise and how learners behave can guide people through crowded online catalogs more effectively. By tracking not just clicks but also completion, quiz success, and sustained attention, the system learns to suggest courses that are more likely to fit each learner’s level and keep them on track. Because it is designed with privacy safeguards and can be extended with techniques such as federated learning and explainable interfaces, the framework offers a practical path toward more supportive, adaptive online classrooms that feel less like wandering a maze and more like having a knowledgeable tutor point out the next best step.
Citation: Chunqin, X., Peixi, W. A hybrid actor–critic and BERT framework for intelligent course recommendation in IoT-aware e-learning systems. Sci Rep 16, 10259 (2026). https://doi.org/10.1038/s41598-026-40952-2
Keywords: online course recommendation, personalized e-learning, reinforcement learning, educational data, learning analytics