Skip to main content

LEVEL 3: FULL ML + BUSINESS CASE STUDIES (Hard)

🧠 Yeh woh case studies hain jo DecisionTree ke actual projects se inspired hain. End-to-end approach dikhana hai — from business problem to model to recommendation.


Case Study 6: Customer Churn Prediction for B2B Distribution

Based on DecisionTree's actual project with a National Packaging Distribution Group

The Problem

"A national packaging distribution company is losing customers — they've noticed a 25% annual churn rate. They want to predict which customers are likely to churn so they can intervene proactively."

Step 1: Clarify

Before anything, ask these questions:

Question to AskWhy It Matters
How do you define "churn"?Is it no orders in 60/90/180 days? Contract cancellation?
What time period are we looking at?Last 1 year? 2 years?
How many customers total?Scale informs technique
What actions can the sales team take?Shapes our recommendations
What data is available?Determines what features we can create

🧠 Pehle question poocho, phir solve karo. Interviewer yeh test kar raha hai ki tum problem samajhte ho ya seedha code likhna shuru kar dete ho. Clarifying questions = maturity.

Step 2: Feature Engineering

import pandas as pd
import numpy as np

# ═══════ Recency Features ═══════
df['days_since_last_order'] = (reference_date - df['last_order_date']).dt.days

# ═══════ Frequency Features ═══════
# Order frequency trend (declining frequency = churn risk)
df['frequency_trend'] = df['orders_last_3m'] / df['orders_prev_3m'].replace(0, 1)
# trend < 1 means orders are decreasing

# ═══════ Monetary Features ═══════
df['avg_order_value'] = df.groupby('customer_id')['amount'].transform('mean')
df['revenue_trend'] = df['revenue_last_3m'] / df['revenue_prev_3m'].replace(0, 1)

# ═══════ Behavioral Features ═══════
df['product_diversity'] = df.groupby('customer_id')['product_category'].transform('nunique')
df['support_tickets_per_order'] = df['total_tickets'] / df['total_orders'].replace(0, 1)
df['is_monthly_contract'] = (df['contract_type'] == 'Monthly').astype(int)

🧠 Feature engineering = raw data se useful signals nikalna. Yeh ML ka sabse important step hai. Interviewer ko dikhao ki tum sirf model nahi lagaate, data se meaningful features banate ho.

Step 3: Model Building

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Define features and target
feature_cols = ['days_since_last_order', 'orders_last_3m', 'avg_order_value',
'frequency_trend', 'support_tickets_per_order',
'product_diversity', 'is_monthly_contract']

X = df[feature_cols]
y = df['churned']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Start with Decision Tree (explainable — good for client presentation)
dt_model = DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, random_state=42)
dt_model.fit(X_train, y_train)

# Upgrade to Random Forest (more accurate)
rf_model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
rf_model.fit(X_train, y_train)

# Always evaluate with classification_report, not just accuracy
print(classification_report(y_test, rf_model.predict(X_test)))

🧠 Kyun do models? Decision Tree = easy to explain to client ("if days_since_order > 90 AND frequency_trend < 0.5, then churn"). Random Forest = higher accuracy. Real projects mein dono dikhate hain — one for presentation, one for production.

Step 4: Business Recommendations

"Based on our analysis, here's what I'd recommend:

  1. Create a Churn Risk Scorecard — each customer gets a score from 0-100 based on the model's probability. Update weekly.

  2. Tier-based intervention:

    • High Risk (score > 70): Assign senior account manager + offer 15% loyalty discount
    • Medium Risk (score 40-70): Proactive check-in call
    • Low Risk (score < 40): Standard engagement; monitor monthly
  3. Reduce monthly contracts — data shows monthly customers churn at 3× the rate of annual customers. Incentivize annual contracts.

  4. Expected Impact: If we retain even 20% of predicted churners (~250 customers), and each averages ₹5 lakh annual revenue, potential revenue saved ≈ ₹12.5 crore."

🧠 Hamesha impact quantify karo. "Revenue saved = ₹12.5 crore" sunke interviewer ko lagta hai ki yeh banda business mein sochta hai, sirf code mein nahi.


Case Study 7: Demand Forecasting for a CPG Brand

Based on DecisionTree's actual project: "Demand Forecasting at Scale"

The Problem

"A leading CPG brand sells 500+ SKUs across 50+ retailers. They need accurate demand forecasts at the SKU-location-week level for the next 12 weeks to optimize production planning."

Step-by-Step Approach

Step 1: Understand Business Context

QuestionAnswer
What decisions depend on this forecast?Production planning, raw material procurement, inventory allocation
What granularity?SKU × Retailer × Week
Current forecast accuracy?~65% MAPE (poor) → Target: <20% MAPE
What data is available?3 years of POS sales, promotions, weather, holidays

Step 2: Feature Engineering for Time Series

# Lag features — past demand predicts future demand
for lag in [1, 2, 4, 8, 12]:
df[f'demand_lag_{lag}'] = df.groupby(['sku_id', 'location'])['units_sold'].shift(lag)

# Rolling statistics — smooth out noise
df['demand_rolling_4w_avg'] = df.groupby(['sku_id', 'location'])['units_sold'].transform(
lambda x: x.rolling(4, min_periods=1).mean()
)

# Calendar features
df['is_festival_season'] = df['month'].isin([10, 11, 12]).astype(int) # Diwali/Christmas

# Handle stockouts — 0 sales with 0 inventory ≠ 0 demand
df.loc[(df['units_sold'] == 0) & (df['inventory'] == 0), 'is_stockout'] = True

🧠 Stockout handling bahut important hai. Agar product out of stock tha aur 0 sales record hui, toh demand actually 0 nahi thi — product available hi nahi tha! Real demand estimate karna padega. Yeh mention karo interview mein.

Step 3: Model Selection

ModelBest ForAccuracy
Moving AverageQuick baselineLow
ARIMA/SARIMASingle product with clear seasonalityMedium
ProphetAutomated forecasting with holidaysMedium
XGBoost/LightGBMMulti-SKU with many featuresHigh

Step 4: Deliverables

  1. Weekly forecast file: SKU, Location, Week, Predicted Demand, Confidence Interval
  2. Dashboard showing forecast vs actuals with drill-down
  3. Alert system for demand deviation >30% (stockout or overstock risk)

Case Study 8: Marketing Mix Optimization

Based on DecisionTree's project for a Premium Lighting Brand

The Problem

"A premium lighting brand spends ₹10 crore annually across TV (40%), Digital (30%), Print (15%), and In-store (15%). They want to know which channel drives the most ROI and how to reallocate."

Key Concepts

Adstock Effect — Advertising has a carryover effect. A TV ad today still impacts sales next week, but the effect decays.

def adstock_transform(spend, decay_rate=0.7):
"""Transform raw spend into adstock (accounting for carryover)."""
adstock = np.zeros(len(spend))
adstock[0] = spend[0]
for i in range(1, len(spend)):
adstock[i] = spend[i] + decay_rate * adstock[i-1]
return adstock

🧠 Adstock ko aise samjho: Aaj TV pe ad dikhi toh aaj kuch log khareedenge, kal kuch log yaad karke khareedenge, parson thode aur — per effect kam hota jaata hai (decay). Yeh "carry-over" ko model karna hai.

Diminishing Returns — The first ₹1 crore on TV gets high returns. The 5th crore on TV gives much less incremental value.

The Recommendation

ChannelCurrent AllocationProposedExpected Impact
TV40% (₹4Cr)35% (₹3.5Cr)-2% TV-driven sales
Digital30% (₹3Cr)45% (₹4.5Cr)+18% digital-driven sales
Print15% (₹1.5Cr)5% (₹0.5Cr)Minimal loss (print already underperforming)
In-store15% (₹1.5Cr)15% (₹1.5Cr)Maintain — solid ROI

Net impact: +8-12% revenue uplift with the same total budget.

🧠 Interview mein key line: "We're not asking for more budget — we're asking to spend the SAME budget more wisely. That's the power of data-driven optimization."