LEVEL 3: FULL ML + BUSINESS CASE STUDIES (Hard)

🧠 Yeh woh case studies hain jo DecisionTree ke actual projects se inspired hain. End-to-end approach dikhana hai — from business problem to model to recommendation.

Case Study 6: Customer Churn Prediction for B2B Distribution

Based on DecisionTree's actual project with a National Packaging Distribution Group

The Problem

"A national packaging distribution company is losing customers — they've noticed a 25% annual churn rate. They want to predict which customers are likely to churn so they can intervene proactively."

Step 1: Clarify

Before anything, ask these questions:

Question to Ask	Why It Matters
How do you define "churn"?	Is it no orders in 60/90/180 days? Contract cancellation?
What time period are we looking at?	Last 1 year? 2 years?
How many customers total?	Scale informs technique
What actions can the sales team take?	Shapes our recommendations
What data is available?	Determines what features we can create

🧠 Pehle question poocho, phir solve karo. Interviewer yeh test kar raha hai ki tum problem samajhte ho ya seedha code likhna shuru kar dete ho. Clarifying questions = maturity.

Step 2: Feature Engineering

import pandas as pd
import numpy as np

# ═══════ Recency Features ═══════
df['days_since_last_order'] = (reference_date - df['last_order_date']).dt.days

# ═══════ Frequency Features ═══════
# Order frequency trend (declining frequency = churn risk)
df['frequency_trend'] = df['orders_last_3m'] / df['orders_prev_3m'].replace(0, 1)
# trend < 1 means orders are decreasing

# ═══════ Monetary Features ═══════
df['avg_order_value'] = df.groupby('customer_id')['amount'].transform('mean')
df['revenue_trend'] = df['revenue_last_3m'] / df['revenue_prev_3m'].replace(0, 1)

# ═══════ Behavioral Features ═══════
df['product_diversity'] = df.groupby('customer_id')['product_category'].transform('nunique')
df['support_tickets_per_order'] = df['total_tickets'] / df['total_orders'].replace(0, 1)
df['is_monthly_contract'] = (df['contract_type'] == 'Monthly').astype(int)

🧠 Feature engineering = raw data se useful signals nikalna. Yeh ML ka sabse important step hai. Interviewer ko dikhao ki tum sirf model nahi lagaate, data se meaningful features banate ho.

Step 3: Model Building

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Define features and target
feature_cols = ['days_since_last_order', 'orders_last_3m', 'avg_order_value',
                'frequency_trend', 'support_tickets_per_order', 
                'product_diversity', 'is_monthly_contract']

X = df[feature_cols]
y = df['churned']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Start with Decision Tree (explainable — good for client presentation)
dt_model = DecisionTreeClassifier(max_depth=5, min_samples_leaf=20, random_state=42)
dt_model.fit(X_train, y_train)

# Upgrade to Random Forest (more accurate)
rf_model = RandomForestClassifier(n_estimators=100, max_depth=8, random_state=42)
rf_model.fit(X_train, y_train)

# Always evaluate with classification_report, not just accuracy
print(classification_report(y_test, rf_model.predict(X_test)))

🧠 Kyun do models? Decision Tree = easy to explain to client ("if days_since_order > 90 AND frequency_trend < 0.5, then churn"). Random Forest = higher accuracy. Real projects mein dono dikhate hain — one for presentation, one for production.

Step 4: Business Recommendations

"Based on our analysis, here's what I'd recommend:

Create a Churn Risk Scorecard — each customer gets a score from 0-100 based on the model's probability. Update weekly.

Tier-based intervention:

High Risk (score > 70): Assign senior account manager + offer 15% loyalty discount

Medium Risk (score 40-70): Proactive check-in call

Low Risk (score < 40): Standard engagement; monitor monthly

Reduce monthly contracts — data shows monthly customers churn at 3× the rate of annual customers. Incentivize annual contracts.

Expected Impact: If we retain even 20% of predicted churners (~250 customers), and each averages ₹5 lakh annual revenue, potential revenue saved ≈ ₹12.5 crore."

🧠 Hamesha impact quantify karo. "Revenue saved = ₹12.5 crore" sunke interviewer ko lagta hai ki yeh banda business mein sochta hai, sirf code mein nahi.

Case Study 7: Demand Forecasting for a CPG Brand

Based on DecisionTree's actual project: "Demand Forecasting at Scale"

The Problem

"A leading CPG brand sells 500+ SKUs across 50+ retailers. They need accurate demand forecasts at the SKU-location-week level for the next 12 weeks to optimize production planning."

Step-by-Step Approach

Step 1: Understand Business Context

Question	Answer
What decisions depend on this forecast?	Production planning, raw material procurement, inventory allocation
What granularity?	SKU × Retailer × Week
Current forecast accuracy?	~65% MAPE (poor) → Target: <20% MAPE
What data is available?	3 years of POS sales, promotions, weather, holidays

Step 2: Feature Engineering for Time Series

# Lag features — past demand predicts future demand
for lag in [1, 2, 4, 8, 12]:
    df[f'demand_lag_{lag}'] = df.groupby(['sku_id', 'location'])['units_sold'].shift(lag)

# Rolling statistics — smooth out noise
df['demand_rolling_4w_avg'] = df.groupby(['sku_id', 'location'])['units_sold'].transform(
    lambda x: x.rolling(4, min_periods=1).mean()
)

# Calendar features
df['is_festival_season'] = df['month'].isin([10, 11, 12]).astype(int)  # Diwali/Christmas

# Handle stockouts — 0 sales with 0 inventory ≠ 0 demand
df.loc[(df['units_sold'] == 0) & (df['inventory'] == 0), 'is_stockout'] = True

🧠 Stockout handling bahut important hai. Agar product out of stock tha aur 0 sales record hui, toh demand actually 0 nahi thi — product available hi nahi tha! Real demand estimate karna padega. Yeh mention karo interview mein.

Step 3: Model Selection

Model	Best For	Accuracy
Moving Average	Quick baseline	Low
ARIMA/SARIMA	Single product with clear seasonality	Medium
Prophet	Automated forecasting with holidays	Medium
XGBoost/LightGBM	Multi-SKU with many features	High

Step 4: Deliverables

Weekly forecast file: SKU, Location, Week, Predicted Demand, Confidence Interval
Dashboard showing forecast vs actuals with drill-down
Alert system for demand deviation >30% (stockout or overstock risk)

Case Study 8: Marketing Mix Optimization

Based on DecisionTree's project for a Premium Lighting Brand

The Problem

"A premium lighting brand spends ₹10 crore annually across TV (40%), Digital (30%), Print (15%), and In-store (15%). They want to know which channel drives the most ROI and how to reallocate."

Key Concepts

Adstock Effect — Advertising has a carryover effect. A TV ad today still impacts sales next week, but the effect decays.

def adstock_transform(spend, decay_rate=0.7):
    """Transform raw spend into adstock (accounting for carryover)."""
    adstock = np.zeros(len(spend))
    adstock[0] = spend[0]
    for i in range(1, len(spend)):
        adstock[i] = spend[i] + decay_rate * adstock[i-1]
    return adstock

🧠 Adstock ko aise samjho: Aaj TV pe ad dikhi toh aaj kuch log khareedenge, kal kuch log yaad karke khareedenge, parson thode aur — per effect kam hota jaata hai (decay). Yeh "carry-over" ko model karna hai.

Diminishing Returns — The first ₹1 crore on TV gets high returns. The 5th crore on TV gives much less incremental value.

The Recommendation

Channel	Current Allocation	Proposed	Expected Impact
TV	40% (₹4Cr)	35% (₹3.5Cr)	-2% TV-driven sales
Digital	30% (₹3Cr)	45% (₹4.5Cr)	+18% digital-driven sales
Print	15% (₹1.5Cr)	5% (₹0.5Cr)	Minimal loss (print already underperforming)
In-store	15% (₹1.5Cr)	15% (₹1.5Cr)	Maintain — solid ROI

Net impact: +8-12% revenue uplift with the same total budget.

🧠 Interview mein key line: "We're not asking for more budget — we're asking to spend the SAME budget more wisely. That's the power of data-driven optimization."

Case Study 6: Customer Churn Prediction for B2B Distribution​

The Problem​

Step 1: Clarify​

Step 2: Feature Engineering​

Step 3: Model Building​

Step 4: Business Recommendations​

Case Study 7: Demand Forecasting for a CPG Brand​

The Problem​

Step-by-Step Approach​

Case Study 8: Marketing Mix Optimization​

The Problem​

Key Concepts​

The Recommendation​

Case Study 6: Customer Churn Prediction for B2B Distribution

The Problem

Step 1: Clarify

Step 2: Feature Engineering

Step 3: Model Building

Step 4: Business Recommendations

Case Study 7: Demand Forecasting for a CPG Brand

The Problem

Step-by-Step Approach

Case Study 8: Marketing Mix Optimization

The Problem

Key Concepts

The Recommendation