Skip to main content

LEVEL 2: SQL + PYTHON ANALYSIS CASE STUDIES (Medium)

🧠 Yahan se SQL + Python dono use karoge. Interview mein typically bolte hain: "Describe your approach and write key code snippets." Poora running code expected nahi hota, but approach crystal clear hona chahiye.


Case Study 4: Customer Segmentation (RFM Analysis)

Common in CPG, Retail, and E-Commerce — DecisionTree's top 3 industries

The Problem

"An e-commerce client wants to segment their 50,000 customers to run targeted marketing campaigns. How would you approach this?"

Step 1: Extract Data with SQL

-- Pull the raw RFM data from the database
SELECT
customer_id,
MAX(order_date) AS last_order_date,
COUNT(DISTINCT order_id) AS frequency,
SUM(total_amount) AS monetary
FROM orders
WHERE status = 'completed'
AND order_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY customer_id;

Step 2: Calculate RFM Scores in Python

import pandas as pd
import numpy as np

# Load the SQL output
rfm = pd.read_sql(query, connection) # or pd.read_csv('rfm_data.csv')

# Calculate Recency (days since last order)
reference_date = rfm['last_order_date'].max() + pd.Timedelta(days=1)
rfm['recency'] = (reference_date - rfm['last_order_date']).dt.days

# Assign quintile scores (1-5)
# Recency: LOWER is better → label 5,4,3,2,1
rfm['R'] = pd.qcut(rfm['recency'], q=5, labels=[5, 4, 3, 2, 1])
# Frequency: HIGHER is better
rfm['F'] = pd.qcut(rfm['frequency'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5])
# Monetary: HIGHER is better
rfm['M'] = pd.qcut(rfm['monetary'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5])

# Create combined score
rfm['RFM_score'] = rfm[['R', 'F', 'M']].astype(int).sum(axis=1)

🧠 RFM framework ratt lo — R = Recency (kab aakhri baar aaya), F = Frequency (kitni baar aaya), M = Monetary (kitna paisa kharch kiya). Har customer ko 3 scores do, phir segments banao.

Step 3: Create Business Segments

def segment_customer(row):
r, f, m = int(row['R']), int(row['F']), int(row['M'])

if r >= 4 and f >= 4 and m >= 4: return 'Champions'
elif r >= 3 and f >= 3 and m >= 3: return 'Loyal Customers'
elif r >= 4 and f <= 2: return 'New Customers'
elif r <= 2 and f >= 3 and m >= 3: return 'At Risk'
elif r <= 2 and f >= 4 and m >= 4: return "Can't Lose Them"
elif r <= 2 and f <= 2 and m <= 2: return 'Lost'
else: return 'Potential Loyalists'

rfm['segment'] = rfm.apply(segment_customer, axis=1)
print(rfm['segment'].value_counts())

Step 4: Marketing Actions Per Segment

SegmentWho Are TheyAction
ChampionsBest customers — buy often, spend a lot, recentReward programs, early access, referral rewards
LoyalRegular buyersUpsell, loyalty tiers
New CustomersRecently joined, few ordersWelcome series, first-purchase incentive
At RiskWere active, now going silentWin-back campaign, "We miss you" email with discount
Can't LoseFormer VIPs who stoppedUrgent outreach, personal call from account manager
LostLong goneLow-cost reactivation or remove from active marketing

🧠 Interview mein yeh table dekhao aur bolo: "Each segment gets a DIFFERENT marketing strategy — we don't waste premium outreach on Lost customers, and we don't ignore Champions who drive most of our revenue." This is connecting data to business decisions — exactly what DecisionTree does.


Case Study 5: Sales Dashboard Design

Based on DecisionTree's "Visualization & Business Insights" capability

The Problem

"A multi-brand packaging distributor needs a unified dashboard showing sales, finance, and supply chain KPIs. Currently, each team uses separate Excel files."

Step 1: Identify KPIs Per Stakeholder

StakeholderKPIs They Need
CEOTotal Revenue, YoY Growth, Top/Bottom Products, Profit Margin
Sales HeadRevenue by Region, Salesperson Performance, Pipeline Value
FinanceGross Margin, EBITDA, Accounts Receivable Days, Cash Flow
Supply ChainInventory Turnover, Out-of-Stock Rate, Lead Time, Fill Rate

🧠 Different stakeholders ko different data chahiye. CEO ko poora picture chahiye, Sales Head ko region-wise breakdown, Finance ko margins. Dashboard mein sab ke liye sections rakhne chahiye.

Step 2: Dashboard Layout Design

┌──────────────────────────────────────────────────────┐
│ HEADER: Company Name | Date Range Filter | Region │
├───────────┬───────────┬───────────┬──────────────────┤
│ Revenue │ Growth │ Margin │ Orders Today │
│ ₹45.2Cr │ +12.3% │ 28.5% │ 347 │
│ SCORECARD │ SCORECARD │ SCORECARD │ SCORECARD │
├───────────┴───────────┴───────────┴──────────────────┤
│ │
│ [LINE CHART: Monthly Revenue Trend — 12 months] │
│ │
├──────────────────────────┬────────────────────────────┤
│ │ │
│ [BAR: Revenue by Region]│ [PIE: Revenue by Product] │
│ │ │
├──────────────────────────┴────────────────────────────┤
│ │
│ [TABLE: Top 10 Customers by Revenue — with trend] │
│ │
└───────────────────────────────────────────────────────┘

Step 3: Key SQL for Dashboard Backend

-- KPI 1: Total Revenue with YoY comparison
SELECT
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2024 THEN amount END) AS current_year,
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END) AS previous_year,
ROUND(
(SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2024 THEN amount END) -
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END)) * 100.0 /
NULLIF(SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END), 0),
2
) AS yoy_growth_pct
FROM orders WHERE status = 'completed';

-- KPI 2: Revenue by Region (for bar chart)
SELECT region, SUM(amount) AS revenue
FROM orders o JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed' AND o.order_date >= '2024-01-01'
GROUP BY region ORDER BY revenue DESC;

Step 4: Key Design Decisions

DecisionRecommendationWhy
Update frequencyDaily refresh (scheduled at 6 AM)Balance between freshness and DB load
Connection typeExtract (not live)Dashboard loads in < 3 seconds
InteractivityRegion, Date Range, Product filtersUsers can drill into their area
Mobile-friendlyYes — responsive layoutExecutives check dashboards on phones