🚀 Day-Before Cheatsheet — Rapid Revision
Everything in One Page — Read This the Night Before
SQL — Top 20 Patterns
| # | Pattern | Syntax |
|---|---|---|
| 1 | Select specific columns | SELECT col1, col2 FROM table |
| 2 | Filter rows | WHERE col = 'value' |
| 3 | Multiple conditions | WHERE a = 1 AND (b = 2 OR c = 3) |
| 4 | Pattern match | WHERE name LIKE 'A%' |
| 5 | Null check | WHERE col IS NULL / IS NOT NULL |
| 6 | Sort results | ORDER BY col DESC |
| 7 | Limit rows | LIMIT 10 / TOP 10 |
| 8 | Aggregate | SELECT dept, COUNT(*), AVG(sal) FROM t GROUP BY dept |
| 9 | Filter groups | GROUP BY dept HAVING COUNT(*) > 5 |
| 10 | Inner Join | FROM a JOIN b ON a.id = b.id |
| 11 | Left Join | FROM a LEFT JOIN b ON a.id = b.id |
| 12 | Self Join | FROM emp e JOIN emp m ON e.mgr_id = m.id |
| 13 | Subquery in WHERE | WHERE sal > (SELECT AVG(sal) FROM emp) |
| 14 | Subquery in FROM | FROM (SELECT ... ) AS sub |
| 15 | CASE statement | CASE WHEN x > 0 THEN 'Yes' ELSE 'No' END |
| 16 | Window — RANK | RANK() OVER (PARTITION BY dept ORDER BY sal DESC) |
| 17 | Window — Running Total | SUM(amt) OVER (ORDER BY date) |
| 18 | Window — LAG/LEAD | LAG(amt) OVER (ORDER BY date) |
| 19 | CTE | WITH cte AS (SELECT ...) SELECT * FROM cte |
| 20 | Delete duplicates | DELETE FROM t WHERE id NOT IN (SELECT MIN(id) FROM t GROUP BY col) |
DAX — Top 10 Formulas
| # | Formula | Use |
|---|---|---|
| 1 | SUM(Sales[Revenue]) | Basic total |
| 2 | COUNTROWS(Sales) | Count rows |
| 3 | DIVIDE(A, B, 0) | Safe division |
| 4 | DISTINCTCOUNT(Sales[CustID]) | Unique count |
| 5 | CALCULATE(SUM(...), filter) | Change filter context |
| 6 | TOTALYTD(SUM(...), Date[Date]) | Year-to-date |
| 7 | SAMEPERIODLASTYEAR(Date[Date]) | Previous year comparison |
| 8 | ALL(Table) | Remove all filters |
| 9 | SUMX(table, expr) | Row-by-row calculation |
| 10 | CALCULATE + ALL → % of Total | Percentage analysis |
Tableau — 5 Must-Know Concepts
- Dimensions vs Measures: Blue = categorical (creates headers). Green = numerical (gets aggregated).
- Context Filter: Creates a temp table BEFORE other filters. Use for "Top N within a group."
- LOD — FIXED:
{FIXED [Region] : SUM([Sales])}— calculates at specified level, ignores the view. - Filter Order: Extract → Data Source → Context → Dimension → Measure → Table Calc.
- Live vs Extract: Extract = faster (local copy, .hyper file). Live = real-time but slower. Use Extract 90% of the time.
Statistics — 5 Core Concepts
- Mean vs Median: Mean = sum/count (sensitive to outliers). Median = middle value (robust). Use Median for skewed data.
- 68-95-99.7 Rule: In a normal distribution, 68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ.
- p-value: If p < 0.05 → result is statistically significant → reject H₀.
- Type I vs II: Type I = false positive (wrongly rejected H₀). Type II = false negative (missed real effect).
- Correlation ≠ Causation: r measures linear relationship. To prove causation → need controlled experiment (A/B test).
Python — 5 One-Liners
# 1. Read CSV and show shape
df = pd.read_csv('data.csv'); print(df.shape)
# 2. Missing values per column
df.isnull().sum()
# 3. Group by and aggregate
df.groupby('city')['revenue'].agg(['sum', 'mean', 'count'])
# 4. Top 10 rows by a column
df.nlargest(10, 'revenue')
# 5. Quick correlation matrix
df[['revenue', 'profit', 'quantity']].corr()
Excel — 5 Key Formulas
| Formula | What It Does |
|---|---|
=INDEX(C:C, MATCH("X", A:A, 0)) | Better VLOOKUP — looks any direction |
=SUMIFS(D:D, B:B, "Delhi", C:C, ">1000") | Conditional sum |
=COUNTIFS(A:A, "Delhi", B:B, ">50000") | Conditional count |
=IFERROR(VLOOKUP(...), "Not Found") | Handle lookup errors |
=TEXT(A1, "MMM-YY") | Format dates |
Interview Mantras — Say These Lines
| Topic | Power Line |
|---|---|
| Outliers | "I'd use Median, not Mean, because Mean is sensitive to outliers." |
| A/B Test | "I'd ensure sufficient sample size with a power analysis before starting." |
| Accuracy Paradox | "98% accuracy is meaningless with 2% fraud — I'd use F1 Score and AUC." |
| Context Filter | "It creates a temporary table — subsequent filters work within that subset." |
| CALCULATE | "It modifies the filter context of a DAX expression." |
| Correlation | "Correlation suggests a relationship but doesn't prove causation." |
| Decision Tree | "Easy to interpret but prone to overfitting. Random Forest fixes this via ensemble averaging." |
| Star Schema | "Fact table = events. Dimensions = context. Filters flow dim → fact." |