Skip to main content

📗 Excel for Data Analysts — Complete Guide

From Scratch for Fresher Data Analyst at DecisionTree Analytics

Why Excel matters: Excel is still the #1 tool in analytics consulting. At DecisionTree, you'll receive raw data in Excel, do quick ad-hoc analysis in Excel, and often hand over workbooks to clients. Interviewers will test your fluency.


1. VLOOKUP vs INDEX-MATCH — The Most Asked Excel Question

VLOOKUP

Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])

Example: Find the price of "Laptop" from a product table

=VLOOKUP("Laptop", A2:C10, 3, FALSE)

A2:C10 → Table containing Product Name, Category, Price
3 → Return the 3rd column (Price)
FALSE → Exact match (ALWAYS use FALSE in interviews!)

Limitations of VLOOKUP:

  1. ❌ Can only look right — lookup column must be the leftmost
  2. ❌ Adding/deleting columns breaks the column index number
  3. ❌ Slower on large datasets
  4. ❌ Cannot return values from a column to the LEFT of the lookup column

INDEX-MATCH (The Superior Alternative)

Syntax: =INDEX(return_range, MATCH(lookup_value, lookup_range, 0))

Example: Same lookup — find price of "Laptop"

=INDEX(C2:C10, MATCH("Laptop", A2:A10, 0))

MATCH("Laptop", A2:A10, 0) → Finds the row position of "Laptop"
INDEX(C2:C10, position) → Returns the value from that row in column C

Why INDEX-MATCH is better:

  1. ✅ Can look in ANY direction (left, right, any column)
  2. ✅ Column insertions/deletions don't break it
  3. ✅ Faster on large datasets
  4. ✅ More flexible — can be used with multiple criteria

XLOOKUP (Excel 365 / 2021+)

=XLOOKUP("Laptop", A2:A10, C2:C10, "Not Found")

Simplest syntax, handles errors natively, looks in any direction.
Use when available, but know VLOOKUP/INDEX-MATCH for compatibility.

🧠 Interview mein bolo: "I prefer INDEX-MATCH over VLOOKUP because it's not limited to left-to-right lookups and doesn't break when columns are rearranged. But I'm comfortable with all three — VLOOKUP, INDEX-MATCH, and XLOOKUP."

Comparison Table

FeatureVLOOKUPINDEX-MATCHXLOOKUP
DirectionRight onlyAny directionAny direction
SpeedSlowerFasterFastest
Column changesBreaksResilientResilient
Error handlingNeeds IFERROR wrapperNeeds IFERROR wrapperBuilt-in
AvailabilityAll versionsAll versionsExcel 365+ only

2. Pivot Tables — The Analyst's Swiss Army Knife

A Pivot Table summarizes large datasets into meaningful insights with drag-and-drop simplicity.

How to Create

1. Select your data range (Ctrl+A if in a table)
2. Insert → Pivot Table → New Worksheet
3. Drag fields:
- ROWS: What you want to group by (e.g., Region, Product)
- COLUMNS: Secondary grouping (e.g., Month, Year)
- VALUES: What to measure (e.g., SUM of Revenue, COUNT of Orders)
- FILTERS: Overall filter (e.g., Year = 2024)

Common Pivot Table Operations

OperationHowExample
Group datesRight-click date → Group → Months/QuartersSee monthly trends
Calculated fieldAnalyze → Fields, Items → Calculated FieldProfit = Revenue - Cost
Show as % of totalValue Field Settings → Show Values As → % of Grand TotalCategory contribution
SortRight-click value → Sort → Largest to SmallestTop products by revenue
SlicersInsert → Slicer → Select fieldInteractive dashboard filter
Drill downDouble-click on any value cellSee underlying rows

Pivot Table Interview Trick

Scenario: "Given sales data, show me the Top 5 products by revenue for each region."

Steps:
1. Create Pivot: Rows = Region + Product, Values = SUM(Revenue)
2. Right-click any product → Filter → Top 10
3. Set: Top 5, by Sum of Revenue
4. Now each region shows only its top 5 products ✅

🧠 Pivot Table = 2 minute mein analysis. Interview mein agar Excel question aaye, pehle Pivot suggest karo. Interviewer ko lagega expert hai.


3. Essential Formulas — Must Know

Conditional Logic

=IF(A1>100, "High", "Low")
=IF(A1>100, "High", IF(A1>50, "Medium", "Low")) -- Nested IF
=IFS(A1>100, "High", A1>50, "Medium", TRUE, "Low") -- Cleaner (365+)

Counting & Summing with Conditions

FormulaWhat It DoesExample
COUNTIFCount cells matching ONE condition=COUNTIF(B:B, "Delhi")
COUNTIFSCount cells matching MULTIPLE conditions=COUNTIFS(B:B, "Delhi", C:C, ">50000")
SUMIFSum values matching ONE condition=SUMIF(B:B, "Delhi", D:D)
SUMIFSSum values matching MULTIPLE conditions=SUMIFS(D:D, B:B, "Delhi", C:C, "Technology")
AVERAGEIFAverage matching a condition=AVERAGEIF(B:B, "Delhi", D:D)

Text Functions

FormulaWhat It DoesExample
LEFT(A1, 3)First 3 characters"Delhi" → "Del"
RIGHT(A1, 2)Last 2 characters"Delhi" → "hi"
MID(A1, 2, 3)3 chars starting from position 2"Delhi" → "elh"
LEN(A1)Length of text"Delhi" → 5
TRIM(A1)Remove extra spaces" Delhi " → "Delhi"
CONCATENATE(A1, " ", B1)Join text"First" & "Last"
TEXT(A1, "MMM-YY")Format numbers/dates as text45000 → "Jan-23"
UPPER/LOWER/PROPERChange case"delhi" → "DELHI" / "Delhi"

Date Functions

=TODAY()                    -- Current date
=YEAR(A1) -- Extract year
=MONTH(A1) -- Extract month
=DATEDIF(A1, B1, "M") -- Months between two dates
=EOMONTH(A1, 0) -- End of current month
=NETWORKDAYS(A1, B1) -- Working days between dates (excludes weekends)

4. Data Cleaning in Excel

ProblemSolution
Extra spaces=TRIM(A1)
Inconsistent case=PROPER(A1) or =UPPER(A1)
Remove duplicatesData → Remove Duplicates
Find blanksCtrl+G → Special → Blanks
Convert text to numbersData → Text to Columns → Finish
Split full nameData → Text to Columns → Delimited → Space
Flash Fill (pattern-based)Type pattern in first cell → Ctrl+E

5. Conditional Formatting

Highlight cells > 1000:        Home → Conditional Formatting → Highlight Rules
Color scales (heatmap effect): Home → Conditional Formatting → Color Scales
Data bars (in-cell bar chart): Home → Conditional Formatting → Data Bars
Top/Bottom rules: Home → Conditional Formatting → Top 10%

🧠 Quick tip: Color scales on a Pivot Table turn it into a heatmap instantly — great for spotting patterns.


6. Keyboard Shortcuts (Interviewers Notice Speed)

ShortcutAction
Ctrl+Shift+LToggle filters on/off
Ctrl+TConvert range to Table
Ctrl+;Insert today's date
Ctrl+Shift+;Insert current time
Ctrl+DFill down
Ctrl+Shift+EndSelect to last used cell
Alt+=Auto SUM
F2Edit cell
F4Toggle absolute reference ($A$1)
Ctrl+``Show formulas instead of values
Ctrl+Shift+~Apply General format
Alt+EnterNew line within a cell

7. Named Ranges & Tables

Named Range: Select cells → Name Box (left of formula bar) → Type name
Use: =SUMIF(RegionRange, "North", RevenueRange)

Excel Tables (Ctrl+T): Auto-expand, structured references
Use: =SUM(SalesTable[Revenue]) -- Much clearer than =SUM(D:D)

Benefits of Tables:

  • Auto-expand when new rows are added
  • Built-in filters and alternating row colors
  • Structured references ([@Revenue] instead of D2)
  • Automatic totals row

8. Interview Questions (12 Questions)

Q1: "VLOOKUP vs INDEX-MATCH?"

Answer: "VLOOKUP searches the leftmost column and returns a value from a specified column to the right. INDEX-MATCH is more flexible — MATCH finds the row position, INDEX returns the value from any column. I prefer INDEX-MATCH because it works in any direction, is faster on large datasets, and doesn't break when columns are inserted or deleted."

Q2: "How would you handle 100,000 rows of messy data in Excel?"

Answer: "First, I'd convert the range to a Table (Ctrl+T) for structured references. Then: (1) Remove duplicates via Data tab. (2) Use TRIM and PROPER to clean text. (3) Text to Columns for splitting combined fields. (4) Conditional Formatting to spot outliers. (5) Pivot Table for summary analysis. If the dataset requires heavy transformation, I'd suggest moving to Power Query or Python."

Q3: "What are Pivot Tables and when would you use them?"

Answer: "Pivot Tables summarize large datasets by aggregating values across categories — like total revenue by region and product category. I use them for quick exploratory analysis: finding top customers, comparing monthly trends, spotting anomalies. The drag-and-drop interface makes it fast to iterate on different views without writing formulas."

Q4: "How do you find and remove duplicates?"

Answer: "Three methods: (1) Data → Remove Duplicates for quick removal. (2) COUNTIF to FLAG duplicates first: =COUNTIF($A$2:$A2, A2) — values > 1 are duplicates. (3) Conditional Formatting → Highlight Duplicates for visual identification before deciding what to remove."

Q5: "Explain absolute vs relative references."

Answer: "Relative references (A1) shift when you copy a formula — row and column adjust automatically. Absolute references ($A$1) stay fixed. Mixed references ($A1 or A$1) lock only the row or column. I use F4 to toggle between them. Common use: tax rate in a fixed cell — =B2*$C$1 so C1 doesn't shift when I drag down."

Q6: "What is a Pivot Chart?"

Answer: "A Pivot Chart is a visual representation of a Pivot Table — it updates automatically when the Pivot Table changes. I'd use it for quick dashboard-style visuals. However, for production dashboards, I'd move to Power BI or Tableau since Pivot Charts have limited formatting options."

Q7: "How do you handle errors like #N/A, #VALUE!, #DIV/0!?"

Answer: "Use IFERROR to wrap formulas: =IFERROR(VLOOKUP(...), 'Not Found'). For specific errors: #N/A usually means VLOOKUP didn't find a match — check for extra spaces with TRIM. #DIV/0! means division by zero — use =IF(B1=0, 0, A1/B1). #VALUE! means wrong data type — check if numbers are stored as text."

Q8: "What's the difference between COUNT, COUNTA, COUNTBLANK?"

Answer: "COUNT counts cells with numbers only. COUNTA counts all non-empty cells (numbers + text). COUNTBLANK counts empty cells. For counting specific values, I use COUNTIF. In interviews, I'd mention that COUNTA can be misleading if cells contain spaces — they look empty but aren't."

Q9: "How would you create a dropdown list in Excel?"

Answer: "Data → Data Validation → List → specify the source range or comma-separated values. I'd combine this with INDIRECT for dependent dropdowns — selecting 'North' in the first dropdown shows only North region cities in the second. This is useful for creating data entry forms."

Q10: "What are array formulas?"

Answer: "Array formulas perform calculations on multiple values simultaneously. In older Excel, they needed Ctrl+Shift+Enter. Modern Excel 365 has dynamic arrays that spill automatically. Example: =UNIQUE(A2:A100) returns all unique values. =SORT(FILTER(A2:C100, C2:C100>1000)) returns filtered and sorted results without helper columns."

Q11: "How do you protect a workbook vs a worksheet?"

Answer: "Worksheet protection (Review → Protect Sheet) prevents users from editing cells but allows viewing. I can selectively unlock cells for input. Workbook protection prevents adding/deleting/renaming sheets. For client deliverables, I'd protect the dashboard sheets but leave an 'Input' sheet editable for them to update parameters."

Q12: "When would you move from Excel to a BI tool like Power BI?"

Answer: "Three triggers: (1) Data exceeds 500K rows — Excel gets slow. (2) Multiple people need to view the same live report — Power BI Service handles sharing and auto-refresh. (3) Need for drill-through, row-level security, or complex DAX — Excel's analytical capabilities are limited. At DecisionTree, I'd start exploration in Excel and graduate to Power BI for the production deliverable."