Predictive customer churn modelling in Telecom industry with high accuracy
By : Flytxt Data Science R&D Team
The International Conference on Machine Learning and Data Mining (MLDM) brings together researchers from all over the world in the field of machine learning and data mining. Data Science R&D team from Flytxt will also be attending this event to be held in New York, US and presenting a research paper on churn prediction models for Telecom industry developed using statistical and data mining techniques. Here is a summary of the paper.
Customer retention is the need of the hour
Customer retention is one of the most common and critical problems in the telecom industry with regard to Customer Relationship Management (CRM). Due to saturated markets and intensive competition, most companies have realised that existing customers are their most valuable asset. They understand that is more beneficial to keep their existing customers satisfied than to keep focusing on acquiring new customers both from effort and cost perspectives. Hence the need to identify customers that are most prone to switching with greater accuracy is of high priority for Telcos world-wide.
Seeing the unforeseen
As part of the research, the prediction models were developed using statistical and data mining techniques for churn prediction. We used linear (general linear model (GLM) or logistic regression) and non-linear techniques of Random Forest (RF) and Deep Learning architectures including Deep Neural Network (DNN), Deep Belief Networks (DBN) and Recurrent Neural Networks (RNN) for prediction. To the best of our knowledge, this is the first time a comparative study of conventional machine learning methods with deep learning techniques has been carried out for churn prediction.
It has been observed that non-linear models performed better. Such predictive models can potentially lead to better decisions while determining the strategies and plans for mitigating churn and retaining them.
Getting the data
For the study, we have used raw data provided by a major Asian telecom service provider for analysis. Raw data was in the form of Call Data Record (CDR) which contains transaction data like duration of the call, start and end time of call, mobile originated call (outgoing call), mobile terminated call (incoming call) and session download (total volume of downloaded data in a data session). The experiments were carried out on anonymised data.
The extracted data volume had information (36 features) of 337817 subscribers out of which 64599 were churners and the rest were non-churners. The parameters of the classifiers were computed based on cross-validation on about 10% of the data. After parameter selection, predictive models were trained and evaluated using the complete data through 5-fold cross validation.
The chart below shows the performance measures obtained for different classifiers. It is observed that the non-linear techniques performed better than the linear one with respect to all performance metrics used. All the non-linear techniques almost gave a comparable performance. In terms of accuracy, AUC and Specificity, Random Forest gave the best performance, and in terms of Sensitivity, Recurrent Neural Networks gave the best performance.
Among the deep learning techniques, RNNs performed better than the DBN and DNN. This may be due to the fact that RNNs can use their internal memory to process arbitrary sequences of inputs. LSTM, the RNN architecture used in the study, which is augmented by recurrent gates called forget gates can also prevent back propagated errors from vanishing or exploding.
Performance measures for various classifiers used. GLM, RF, DNN, DBN and RNN stands for general linear model (logistic regression), Random Forest, Deep Neural Networks, Deep Belief Networks and Recurrent Neural Network, respectively.
Go for accuracy!
As the old saying goes, ‘Make new friends, but keep the old. One is silver, the other gold.’ As per a research done by Frederick Reichheld of Bain & Company, increasing customer retention rates by 5% increases profits by 25% to 95%. Thus, acquiring new customers is important, but retaining them accelerates profitable growth.
Hence it is highly imperative that Telecom service providers need to go in for models that will give greater coverage and accuracy while predicting churn. This problem is not just limited to Telecom, it has its relevance across the industries. It just shows the value of this research, its applicability across domains and its relevance in ensuring profitable growth.