An unsupervised approach to mining anomalous subscribers finds applicability across use cases like churn prediction and fraud detection
By : Flytxt R&D Team
Anomaly detection refers to the problem of finding events or patterns in data that do not conform to the normal or the expected behaviour. These non-conforming patterns are often referred to as anomalies. In telecom domain, subscribers who display a significantly different behaviour from others and change their own behaviour over time are likely to be more anomalous as compared to others who are consistent in their behaviour. Identifying anomalous users can provide significant (and often critical) actionable information in a wide variety of application areas. One of the areas is subscription fraud detection in telecom domain, where the fraudster abuses the service by making significant usage of telecom services (for example, calling, messaging, internet, etc.) without having the intention to pay, causing enormous losses to the telcos.
However, one of the major challenges in an anomaly detection problem is to define a normal region which comprises every possible normal behaviour. In addition, the boundary between normal and anomalous behaviour is often imprecise, making the task even more difficult. For instance, an anomalous observation which lies close to the boundary can actually be normal, and vice-versa.
To overcome these limitations, Flytxt has developed a novel, unsupervised model based on a cluster migration approach, which clusters the subscriber base into k clusters, analyses the clusters’ behaviour and the movement patterns of subscribers across these clusters over time. It then assigns an anomaly score to each subscriber using an appropriate evaluation function. This anomaly score quantifies the inconsistency in a subscriber’s current usage from its own historical usage pattern as well as those of others, over a period of time. A subscriber having a higher anomaly score is more likely to be anomalous as compared to the ones having lower scores. We evaluated our model based on the consideration that the top anomalous subscribers are churners. We used a randomly sampled dataset from a Tier-1 telco, where the sample consisted of only high value subscribers, having a significantly low churn rate. Using only a minimal set of KPIs, our model was able to cover 53.52% true churners in Top 2 deciles. The churn coverage would further increase, if more meaningful KPIs are supplied based on specific objectives.