Clear Sky Science · en

Neural network approach enhancing churn prediction with categorical encoding and standard scaling

2026-01-27 · Back to index

Why keeping customers matters

When you cancel a phone plan, close a bank account, or stop using a subscription service, you become what businesses call a “churned” customer. Replacing you with someone new is far more expensive than keeping you, so companies are eager to spot early warning signs that a customer is about to leave. This study explores how a carefully designed neural network—a kind of artificial intelligence—can more accurately predict which bank customers are likely to walk away, helping firms spend their retention budgets more wisely.

Turning raw bank records into warning signals

The researchers worked with a public dataset of about 10,000 bank customers, each described by a dozen pieces of information such as age, country, account balance, tenure with the bank, and whether they have a credit card or are an active user. A central challenge is that this information comes in different forms: some values are numbers (like salary), others are categories (like country), and the proportion of customers who actually leave is relatively small. The team focused on two often overlooked but crucial steps—how to convert categorical information into numbers (categorical encoding) and how to put numerical fields on a comparable scale (standard scaling)—before feeding everything into a neural network.

Cleaning and balancing the data

To make fair predictions, the data first had to be cleaned. Missing values and odd outliers were handled, and country and other categorical details were transformed using a technique called one-hot encoding, which represents each category as a set of simple yes/no flags instead of arbitrary numeric labels. At the same time, numerical measures such as credit score and account balance were standardized so that no single large-valued field would dominate the learning process. Because customers who leave are less common than those who stay, the team also adjusted the training procedure so that mistaken predictions on churners counted more heavily than errors on stayers, nudging the network to pay attention to the minority group.

Teaching the network to spot at-risk customers

On top of this prepared data, the authors built a multilayer neural network that processes around 30 input features through several hidden layers. Each layer applies weighted combinations of inputs followed by a simple nonlinear rule, allowing the model to capture subtle interactions such as how balance, tenure, and activity status jointly influence the likelihood of leaving. Training was done within a rigorous cross-validation framework: the dataset was repeatedly split into training and test segments so that the model’s performance would reflect how well it generalizes to new customers, not just how well it memorizes those it has seen before. The system’s output is a probability of churn for each customer—essentially a risk score that a bank can act on.

How well the model performs in practice

The neural network achieved high overall accuracy and, crucially, very high precision: more than four out of five customers it flagged as likely churners actually were at risk. That means banks can focus expensive retention offers on a relatively small group with confidence, instead of wasting money on many customers who would have stayed anyway. Although the model misses some churners (its recall is modest), it rarely mislabels loyal customers as flight risks, which is essential when incentives and outreach campaigns are costly. When compared with a suite of other popular methods—such as Random Forests, Gradient Boosting, and logistic regression—the proposed neural network matched or exceeded them on key measures of ranking and discrimination, and particularly stood out in minimizing false alarms.

What drives leaving and how banks can respond

Beyond raw scores, the authors probed which factors the model relied on most. Account balance and whether someone is an “active member” turned out to be leading signals, with credit card ownership, country, and age also playing important roles. In other words, signs of financial engagement and day-to-day activity are strong clues about loyalty. The team also examined how well the model behaved across different countries and genders, and how well its risk scores aligned with actual churn rates. They showed that for low- to medium-risk customers, the probabilities are well calibrated, and that the model can be used to design targeted campaigns that maximize profit: focusing on the top 10–30% highest-risk customers yields the greatest financial return; beyond that, extra outreach starts to cost more than it saves.

What this means for everyday services

In simple terms, the study shows that paying close attention to how data are prepared—especially turning categories into numbers and putting all features on a common scale—can make neural networks much more reliable tools for predicting who is likely to leave a service. The resulting model does not just score well on paper; it offers banks and similar businesses a practical way to identify truly at-risk customers while avoiding wasteful campaigns. By highlighting the most influential signals of churn and showing how predictions link directly to profit, this work moves churn prediction from a purely technical exercise toward a decision tool that can help everyday companies keep their customers longer.

Citation: Bhattacharjee, B., Madhu, U., Guha, S.K. et al. Neural network approach enhancing churn prediction with categorical encoding and standard scaling. Sci Rep 16, 6274 (2026). https://doi.org/10.1038/s41598-026-37407-z

Keywords: customer churn, neural networks, banking analytics, machine learning, customer retention