Clear Sky Science · en
A novel hybrid deep learning framework for customer churn prediction using RFM and embedding clustering
Why keeping online shoppers matters
Every online store struggles with the same question: which customers are quietly slipping away, and which are likely to come back? Finding out early lets companies spend less on chasing new shoppers and more on keeping the ones they already have. This study introduces a data driven way to spot at risk customers by turning messy click and purchase histories into clear signals that a person is about to stop buying.

Looking at how often and how much people buy
The authors start from a simple idea that many marketers already know: customers differ in how recently they bought, how often they buy, and how much they spend. These three numbers, called recency, frequency, and monetary value, sketch a basic picture of each shopper. People who buy often, spend more, and have purchased recently are usually loyal. Those who have not bought anything for a long time, buy rarely, and spend little are more likely to leave. The study uses these three signals, plus a few extra timing based statistics, as the foundation for all later analysis.
Grouping shoppers with similar habits
Instead of treating all customers as one crowd, the researchers first group them into behavior based segments. They use a deep learning technique that compresses each customer’s purchase profile into a compact code, then automatically forms clusters in this code space. This finds patterns that are hard to see with simple rules, such as subtle differences between steady regular buyers, seasonal shoppers, and those who are drifting away. The resulting clusters line up with real business meanings: some groups are almost all loyal customers, while others contain a high share of people who are about to stop buying.

Teaching the system to follow customer journeys over time
After the customers are grouped, the study feeds these segments, along with the spending patterns, into neural networks that are good at handling sequences. These networks were originally designed for reading sentences or sound, but here they read streams of shopping events instead. They learn how a person’s activity changes over weeks or months and how those changes tend to end, either in continued buying or in silence. The researchers train and test their models on two very different real world datasets, one made of traditional purchase records and another built from detailed click and event logs.
Comparing new methods with old tools
The team then compares their hybrid approach with standard tools such as logistic regression and support vector machines. Simple models perform well when the data are already tidy but struggle when behavior is complex or noisy. By contrast, the new framework first reshapes the data through deep clustering and then captures timing patterns with sequential networks. Across both datasets, this setup reaches accuracy close to perfect while staying balanced between catching churners and not raising too many false alarms. An ablation study shows that adding the clustering step clearly boosts performance compared to using the sequence models alone.
What this means for online businesses
For a non specialist, the main message is that richer use of data about when and how people shop can turn routine logs into early warnings about who is likely to leave. By combining simple spending summaries, smart customer grouping, and models that follow behavior over time, the framework offers a more reliable way to flag at risk customers. Companies can then focus retention offers, support, or content on the people who most need attention, improving loyalty without guessing or relying only on gut feeling.
Citation: Ibrahim, S., Tawfik, B.S., Makhlouf, M.A. et al. A novel hybrid deep learning framework for customer churn prediction using RFM and embedding clustering. Sci Rep 16, 16563 (2026). https://doi.org/10.1038/s41598-026-53220-0
Keywords: customer churn, e-commerce, customer segmentation, deep learning, RFM analysis