LEVEL 3: FULL ML + BUSINESS CASE STUDIES (Hard)
🧠 Yeh woh case studies hain jo DecisionTree ke actual projects se inspired hain. End-to-end approach dikhana hai — from business problem to model to recommendation.
Case Study 6: Customer Churn Prediction for B2B Distribution
Based on DecisionTree's actual project with a National Packaging Distribution Group
The Problem
"A national packaging distribution company is losing customers — they've noticed a 25% annual churn rate. They want to predict which customers are likely to churn so they can intervene proactively."
Step 1: Clarify
Before anything, ask these questions:
| Question to Ask | Why It Matters |
|---|---|
| How do you define "churn"? | Is it no orders in 60/90/180 days? Contract cancellation? |
| What time period are we looking at? | Last 1 year? 2 years? |
| How many customers total? | Scale informs technique |
| What actions can the sales team take? | Shapes our recommendations |
| What data is available? | Determines what features we can create |
🧠 Pehle question poocho, phir solve karo. Interviewer yeh test kar raha hai ki tum problem samajhte ho ya seedha code likhna shuru kar dete ho. Clarifying questions = maturity.
Step 2: Feature Engineering
import pandas as pd
import numpy as np
# ═══════ Recency Features ═══════
df['days_since_last_order'] = (reference_date - df['last_order_date']).dt.days
# ═══════ Frequency Features ═══════
# Order frequency trend (declining frequency = churn risk)
df['frequency_trend'] = df['orders_last_3m'] / df['orders_prev_3m'].replace(0, 1)
# trend < 1 means orders are decreasing
# ═══════ Monetary Features ═══════
df['avg_order_value'] = df.groupby('customer_id')['amount'].transform('mean')
df['revenue_trend'] = df['revenue_last_3m'] / df['revenue_prev_3m'].replace(0, 1)
# ═══════ Behavioral Features ═══════
df['product_diversity'] = df.groupby('customer_id')['product_category'].transform('nunique')
df['support_tickets_per_order'] = df['total_tickets'] / df['total_orders'].replace(0, 1)
df['is_monthly_contract'] = (df['contract_type'] == 'Monthly').astype(int)
🧠 Feature engineering = raw data se useful signals nikalna. Yeh ML ka sabse important step hai. Interviewer ko dikhao ki tum sirf model nahi lagaate, data se meaningful features banate ho.
Step 3: Model Building
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# Define features and target
feature_cols = ['days_since_last_order', 'orders_last_3m', 'avg_order_value',
'frequency_trend', 'support_tickets_per_order',
'product_diversity', 'is_monthly_contract']
X = df[feature_cols]
y = df['churned']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Start with Decision Tree (explainable — good for client presentation)
dt_model = DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, random_state=42)
dt_model.fit(X_train, y_train)
# Upgrade to Random Forest (more accurate)
rf_model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
rf_model.fit(X_train, y_train)
# Always evaluate with classification_report, not just accuracy
print(classification_report(y_test, rf_model.predict(X_test)))
🧠 Kyun do models? Decision Tree = easy to explain to client ("if days_since_order > 90 AND frequency_trend < 0.5, then churn"). Random Forest = higher accuracy. Real projects mein dono dikhate hain — one for presentation, one for production.
Step 4: Business Recommendations
"Based on our analysis, here's what I'd recommend:
Create a Churn Risk Scorecard — each customer gets a score from 0-100 based on the model's probability. Update weekly.
Tier-based intervention:
- High Risk (score > 70): Assign senior account manager + offer 15% loyalty discount
- Medium Risk (score 40-70): Proactive check-in call
- Low Risk (score < 40): Standard engagement; monitor monthly
Reduce monthly contracts — data shows monthly customers churn at 3× the rate of annual customers. Incentivize annual contracts.
Expected Impact: If we retain even 20% of predicted churners (~250 customers), and each averages ₹5 lakh annual revenue, potential revenue saved ≈ ₹12.5 crore."
🧠 Hamesha impact quantify karo. "Revenue saved = ₹12.5 crore" sunke interviewer ko lagta hai ki yeh banda business mein sochta hai, sirf code mein nahi.
Case Study 7: Demand Forecasting for a CPG Brand
Based on DecisionTree's actual project: "Demand Forecasting at Scale"
The Problem
"A leading CPG brand sells 500+ SKUs across 50+ retailers. They need accurate demand forecasts at the SKU-location-week level for the next 12 weeks to optimize production planning."
Step-by-Step Approach
Step 1: Understand Business Context
| Question | Answer |
|---|---|
| What decisions depend on this forecast? | Production planning, raw material procurement, inventory allocation |
| What granularity? | SKU × Retailer × Week |
| Current forecast accuracy? | ~65% MAPE (poor) → Target: <20% MAPE |
| What data is available? | 3 years of POS sales, promotions, weather, holidays |
Step 2: Feature Engineering for Time Series
# Lag features — past demand predicts future demand
for lag in [1, 2, 4, 8, 12]:
df[f'demand_lag_{lag}'] = df.groupby(['sku_id', 'location'])['units_sold'].shift(lag)
# Rolling statistics — smooth out noise
df['demand_rolling_4w_avg'] = df.groupby(['sku_id', 'location'])['units_sold'].transform(
lambda x: x.rolling(4, min_periods=1).mean()
)
# Calendar features
df['is_festival_season'] = df['month'].isin([10, 11, 12]).astype(int) # Diwali/Christmas
# Handle stockouts — 0 sales with 0 inventory ≠ 0 demand
df.loc[(df['units_sold'] == 0) & (df['inventory'] == 0), 'is_stockout'] = True
🧠 Stockout handling bahut important hai. Agar product out of stock tha aur 0 sales record hui, toh demand actually 0 nahi thi — product available hi nahi tha! Real demand estimate karna padega. Yeh mention karo interview mein.
Step 3: Model Selection
| Model | Best For | Accuracy |
|---|---|---|
| Moving Average | Quick baseline | Low |
| ARIMA/SARIMA | Single product with clear seasonality | Medium |
| Prophet | Automated forecasting with holidays | Medium |
| XGBoost/LightGBM | Multi-SKU with many features | High |
Step 4: Deliverables
- Weekly forecast file: SKU, Location, Week, Predicted Demand, Confidence Interval
- Dashboard showing forecast vs actuals with drill-down
- Alert system for demand deviation >30% (stockout or overstock risk)
Case Study 8: Marketing Mix Optimization
Based on DecisionTree's project for a Premium Lighting Brand
The Problem
"A premium lighting brand spends ₹10 crore annually across TV (40%), Digital (30%), Print (15%), and In-store (15%). They want to know which channel drives the most ROI and how to reallocate."
Key Concepts
Adstock Effect — Advertising has a carryover effect. A TV ad today still impacts sales next week, but the effect decays.
def adstock_transform(spend, decay_rate=0.7):
"""Transform raw spend into adstock (accounting for carryover)."""
adstock = np.zeros(len(spend))
adstock[0] = spend[0]
for i in range(1, len(spend)):
adstock[i] = spend[i] + decay_rate * adstock[i-1]
return adstock
🧠 Adstock ko aise samjho: Aaj TV pe ad dikhi toh aaj kuch log khareedenge, kal kuch log yaad karke khareedenge, parson thode aur — per effect kam hota jaata hai (decay). Yeh "carry-over" ko model karna hai.
Diminishing Returns — The first ₹1 crore on TV gets high returns. The 5th crore on TV gives much less incremental value.
The Recommendation
| Channel | Current Allocation | Proposed | Expected Impact |
|---|---|---|---|
| TV | 40% (₹4Cr) | 35% (₹3.5Cr) | -2% TV-driven sales |
| Digital | 30% (₹3Cr) | 45% (₹4.5Cr) | +18% digital-driven sales |
| 15% (₹1.5Cr) | 5% (₹0.5Cr) | Minimal loss (print already underperforming) | |
| In-store | 15% (₹1.5Cr) | 15% (₹1.5Cr) | Maintain — solid ROI |
Net impact: +8-12% revenue uplift with the same total budget.
🧠 Interview mein key line: "We're not asking for more budget — we're asking to spend the SAME budget more wisely. That's the power of data-driven optimization."