Telco Churn Analysis — Maria Namitha Nelson

7,043Customer Records

26%Churn Rate

31Features After Encoding

2Models Built & Compared

79.49%Best Accuracy (CNN)

The question we were trying to answer

Every telecom company loses customers — but can you predict who is about to leave before they actually do? That's the churn prediction problem. If you can spot a customer who's about to cancel, you have a window to intervene — offer a discount, improve their plan, or just check in. Miss that window and they're gone.

We took 7,043 real customer records from IBM's Watson Analytics dataset and built two neural network models to answer exactly that question.

The short answer

The customers most likely to leave are the ones who've been around the least and are paying the most. New + expensive = high risk. Get a customer past the 30-month mark and they're almost certainly staying.

Step 1 — Looking at the raw data

The dataset has 21 columns — customer ID, demographics, services (phone, internet, streaming, security), contract type, payment method, charges, and whether they churned. Here's what the first few rows look like straight out of the CSV:

The raw dataset — 7,043 customers, 21 columns. TotalCharges is stored as text (object type) — that needed fixing before anything else.

One early catch: TotalCharges was stored as text, including blank values for brand new customers who'd never been billed. We converted it to numeric and filled blanks with the median — simpler and cleaner than dropping those customers.

Step 2 — First look at the distributions

Before touching any model, we visualised how the data is shaped. These are the initial charts straight from the notebook:

Distribution of Tenure (top) and Distribution of Monthly Charges (bottom)

Top: Distribution of Tenure — how long customers have been with the company. Bottom: Distribution of Monthly Charges.

Tenure chart (top): Giant spike at month 0–2 (new customers), relatively flat through the middle, then another spike at month 70–72 (long-term loyalists). The customer base is polarised — you either just joined or you've been here forever. Very few people in between.

Monthly Charges chart (bottom): Massive spike at $20 (basic phone-only plans), dips in the middle, then climbs again at $70–$100 (premium packages with internet, streaming, security). Two very different customer profiles.

Step 3 — Does churn relate to tenure and charges?

These boxplots compare customers who stayed vs customers who left — first pass:

Churn by Tenure and Churn by Monthly Charges boxplots

Top: Churn by Tenure. Bottom: Churn by Monthly Charges. The size difference between the boxes tells the whole story.

Tenure boxplot (top): The "No" (stayed) box spans 15–60 months, median around 38. The "Yes" (churned) box is tiny and sits right at the bottom — median around 10 months. Most churners are gone before month 30. This is the clearest signal in the dataset.

Monthly Charges boxplot (bottom): The "Yes" box sits higher. Churners' median is around $79/month; non-churners' median is around $65. A consistent $14/month gap — not a fluke.

Step 4 — The same charts with colour, plus the key scatter plot

Coloured charts: tenure distribution, monthly charges, and scatter plot

Top: Tenure distribution (blue). Middle: Monthly Charges distribution (green). Bottom: Scatter plot — every single customer as a dot, blue = stayed, orange = churned.

The scatter plot (bottom) is the most revealing chart in the project. Each dot is one customer. Look at the top-left corner — high charges, short tenure. That's where orange dots cluster. Bottom-right — low charges, long tenure — almost entirely blue. If you wanted a simple rule: "flag anyone paying over $70 who's been here under 12 months" — you'd catch a huge chunk of the real risk.

Churn by Tenure (coloured) — green box = stayed, orange box = churned. The churners' box is dramatically smaller and lower.

Churn by Monthly Charges — coloured boxplot

Churn by Monthly Charges (coloured) — churners (yellow, right) pay noticeably more per month than non-churners (teal, left).

Step 5 — The notebook's annotated final charts

These are the charts as they appeared in the final submitted notebook, with findings written alongside each one:

Notebook chart 1 — Distribution of Tenure with annotations

Notebook — Chart 1: Distribution of Tenure. Customers cluster at two extremes — brand new and long-term loyal.

Notebook chart 2 — Distribution of Monthly Charges with annotations

Notebook — Chart 2: Distribution of Monthly Charges. Low-charge spike at $20, premium cluster at $70–100.

Notebook chart 3 — Scatter Plot Tenure vs Monthly Charges by Churn

Notebook — Chart 3: Scatter Plot. Churners (orange) concentrate in the top-left danger zone: new + expensive.

Notebook chart 4 — Boxplot Churn by Tenure

Notebook — Chart 4: Churn by Tenure. Short-tenure customers are significantly more likely to leave.

Notebook chart 5 — Boxplot Churn by Monthly Charges

Notebook — Chart 5: Churn by Monthly Charges. Higher payers churn more — the company's most valuable customers are also its most fragile.

Making the data model-ready

New feature — AvgMonthlyCharges: TotalCharges ÷ tenure. Gives context — paying $80/month for 2 months is very different from paying $80/month for 5 years.
One-hot encoding: Text columns like Contract type split into separate yes/no columns. Models can only process numbers.
StandardScaler: Normalised all numerical columns so no single feature dominates just because it has larger numbers.
Result: 31 clean numeric features, split 80/20 into 5,634 training and 1,409 test samples.

feature engineering

data_encoded['AvgMonthlyCharges'] = data_encoded['TotalCharges'] / data_encoded['tenure']
data_encoded['AvgMonthlyCharges'].replace([np.inf, np.nan], 0, inplace=True)

scaler = StandardScaler()
data_encoded[numerical_features] = scaler.fit_transform(data_encoded[numerical_features])

Model 1 — Neural Network

A feedforward neural network: layers that transform the data, learning which combinations of features best predict churn over 50 training passes. The Dropout layers (30%) randomly silence neurons during training — forcing the model to learn robust patterns instead of just memorising the training data.

31Input Features

64 → 32Hidden Neurons

30%Dropout

50Epochs

79.42%Test Accuracy

neural network

nn_model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')  # outputs a 0–1 churn probability
])

Model 2 — CNN on tabular data

CNNs are usually for images — but we reshaped each customer's 31 features into a column (31×1) and let a convolutional filter scan across adjacent features looking for clusters that signal churn. It sounds unusual, but it works: the filter can detect combinations like "fibre optic + month-to-month contract + high charges" as a unit.

31×1Reshaped Input

32 filtersConv1D

64Dense Neurons

50Epochs

79.49%Test Accuracy ✓

cnn

X_train_cnn = np.expand_dims(X_train, axis=-1).astype('float32')

cnn_model = Sequential([
    Conv1D(32, kernel_size=3, activation='relu',
           input_shape=(X_train_cnn.shape[1], 1)),
    Dropout(0.3),
    Flatten(),
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

Results — full breakdown

What these numbers mean in plain English

Accuracy 79.49% — about 1,120 of 1,409 test customers predicted correctly.

Precision 63.75% (churn) — when the CNN says "this person will churn," it's right 64% of the time. 36% are false alarms.

Recall 52.67% (churn) — the model caught 53% of actual churners. Half slipped through — room to improve.

AUC 70.9% — measures how well the model separates churners from non-churners. Random guessing = 50%. We're at 71% — a solid baseline.

84%Non-Churn Precision

89%Non-Churn Recall

86%Non-Churn F1

64%Churn Precision

53%Churn Recall

58%Churn F1

The model is much better at predicting who stays (89% recall) than who leaves (53% recall) — because only 26% of the dataset churned. The classic imbalanced dataset problem.

What I'd do differently

SMOTE oversampling — generate synthetic churn examples so the model stops favouring the majority class
Lower decision threshold — use 0.35 instead of 0.5 to flag more churners. A false alarm is cheap; a missed churner is expensive.
Try XGBoost — gradient boosted trees generally outperform neural networks on structured tabular data
Richer features — support call history, plan change frequency, and payment delays would all be powerful additional signals

Who's About to Leave?Telco Customer Churn

The question we were trying to answer

Step 1 — Looking at the raw data

Step 2 — First look at the distributions

Step 3 — Does churn relate to tenure and charges?

Step 4 — The same charts with colour, plus the key scatter plot

Step 5 — The notebook's annotated final charts

Making the data model-ready

Model 1 — Neural Network

Model 2 — CNN on tabular data

Results — full breakdown

What I'd do differently

Who's About to Leave?
Telco Customer Churn