Data Visualization
visualization** — NOT on algorithms or web development.
🧠 Hinglish mein samjho: Python data analyst ka Swiss Army knife hai — data laana, saaf karna, analyze karna, graph banana — sab ek hi language mein. DecisionTree mein Pandas aur Seaborn roz use hote hain. DSA ya web development BILKUL nahi poochenge — sirf data manipulation aur analysis poochenge.
CHAPTER 1: PYTHON BASICS (Quick Review)
🧠 Agar Python bilkul nahi aati: Yeh chapter pehle 2-3 baar practice karo. Agar thodi bahut aati hai toh seedha Chapter 3 (Pandas) pe jaao — wahi interview mein poochha jaata hai.
1.1 Variables & Data Types
# Variables — no need to declare type (Python figures it out)
name = "DecisionTree" # str (text)
employees = 160 # int (whole number)
revenue = 45.5 # float (decimal)
is_profitable = True # bool (True/False)
cities = ["Delhi", "Mumbai"] # list (ordered, changeable)
info = {"name": "DT", "year": 2004} # dict (key-value pairs)
| Type | Example | Mutable? | Ordered? |
|---|---|---|---|
str | "hello" | ❌ | ✅ |
int | 42 | ❌ | N/A |
float | 3.14 | ❌ | N/A |
bool | True/False | ❌ | N/A |
list | [1, 2, 3] | ✅ | ✅ |
tuple | (1, 2, 3) | ❌ | ✅ |
set | {1, 2, 3} | ✅ | ❌ |
dict | {"a": 1} | ✅ | ✅ (Python 3.7+) |
Mutable = bana ke baad change kar sakte ho (jaise list). Immutable = ek baar bana di toh change nahi hogi (jaise string, tuple). Interview mein zaroor poochte hain!
1.2 Lists — The Most Common Data Structure
fruits = ["apple", "banana", "cherry"]
# Accessing elements (0-indexed)
fruits[0] # "apple" (first)
fruits[-1] # "cherry" (last)
fruits[1:3] # ["banana", "cherry"] (slice: start inclusive, end exclusive)
# Modifying
fruits.append("date") # Add to end → ["apple", "banana", "cherry", "date"]
fruits.insert(1, "avocado") # Insert at position 1
fruits.remove("banana") # Remove by value
fruits.pop(0) # Remove by index, returns removed item
# Useful operations
len(fruits) # Number of items
sorted(fruits) # Returns new sorted list
fruits.sort() # Sorts in-place
"apple" in fruits # True/False — membership test
1.3 Dictionaries — Key-Value Lookup
customer = {
"name": "Rajesh",
"city": "Delhi",
"orders": 5
}
# Access
customer["name"] # "Rajesh"
customer.get("phone", "N/A") # "N/A" (safe access — no error if key missing)
# Modify
customer["orders"] = 6 # Update value
customer["email"] = "r@dt.com" # Add new key
# Loop through
for key, value in customer.items():
print(f"{key}: {value}")
1.4 Loops
# For loop — iterate over a sequence
for i in range(5): # 0, 1, 2, 3, 4
print(i)
for city in ["Delhi", "Mumbai", "Bangalore"]:
print(city)
# While loop — repeat until condition is false
count = 0
while count < 5:
print(count)
count += 1 # Don't forget to update!
# List comprehension — ek line mein loop + list banana (BAHUT use hota hai!)
# Socho: "Har x ka square nikalo, x = 1 se 5 tak"
squares = [x**2 for x in range(1, 6)] # [1, 4, 9, 16, 25]
# Condition ke saath — "sirf even numbers ka square do"
even_squares = [x**2 for x in range(1, 11) if x % 2 == 0] # [4, 16, 36, 64, 100]
# 🧠 Yaad rakho: [RESULT for ITEM in LIST if CONDITION]
1.5 Functions
def calculate_profit(revenue, cost):
"""Calculate profit and profit percentage."""
profit = revenue - cost
profit_pct = (profit / cost) * 100
return profit, profit_pct # Return multiple values as a tuple
# Call the function
p, pct = calculate_profit(10000, 8000)
print(f"Profit: ₹{p}, Profit%: {pct}%") # Profit: ₹2000, Profit%: 25.0%
Lambda Functions (One-line functions)
# Lambda = chhoti si bina-naam-ki function — ek line mein kaam ho jaaye
double = lambda x: x * 2 # "x lo, x*2 return karo"
double(5) # 10
# 🧠 Lambda Pandas mein BAHUT use hota hai:
# df['revenue'].apply(lambda x: x / 1000) → har value ko 1000 se divide karo
# Jab chhota kaam hai toh function banana bekar hai — lambda lagao
1.6 Exception Handling
try:
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero!")
except Exception as e:
print(f"Error: {e}")
finally:
print("This always runs")
CHAPTER 2: NumPy — Numerical Computing
2.1 What is NumPy?
NumPy (Numerical Python) is the foundation for numerical computing in Python. Pandas is built on top of it.
import numpy as np
# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
zeros = np.zeros(5) # [0, 0, 0, 0, 0]
ones = np.ones((3, 4)) # 3×4 matrix of 1s
sequence = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
2.2 Why NumPy Over Lists?
| Feature | Python List | NumPy Array |
|---|---|---|
| Speed | Slow | 10–100× faster |
| Element-wise operations | ❌ Need loops | ✅ Vectorized |
| Memory | More | Less |
| Math operations | Manual | Built-in |
# Element-wise operations (impossible with plain lists)
arr = np.array([10, 20, 30, 40, 50])
arr * 2 # [20, 40, 60, 80, 100]
arr + 5 # [15, 25, 35, 45, 55]
arr > 25 # [False, False, True, True, True]
arr[arr > 25] # [30, 40, 50] — boolean filtering
2.3 Essential NumPy Functions for Data Analysis
data = np.array([23, 45, 12, 67, 34, 56, 78, 89, 21, 43])
np.mean(data) # 46.8 — Average
np.median(data) # 44.0 — Middle value
np.std(data) # 23.3 — Standard deviation
np.var(data) # 543.0 — Variance
np.min(data) # 12
np.max(data) # 89
np.sum(data) # 468
np.percentile(data, 75) # 75th percentile (Q3)
np.sort(data) # Sorted array
np.unique(data) # Unique values
CHAPTER 3: PANDAS — The Heart of Data Analysis
🧠 YEH CHAPTER SABSE IMPORTANT HAI! Interview mein 70% Python questions Pandas se aate hain. Isko ache se padho aur practice karo.
3.1 What is Pandas?
Pandas ek Python library hai jo DataFrames deti hai — 2D tables (Excel sheets jaisi) — jinpe powerful data manipulation kar sakte ho.
🧠 Socho DataFrame ko Excel sheet samjho — rows hain, columns hain, filter kar sakte ho, sort kar sakte ho, formulas laga sakte ho. Bas Python mein likha jaata hai Excel clicks ki jagah.
import pandas as pd # pd = chhota naam, hamesha aise hi import karte hain
# Dictionary se DataFrame banana (interview mein karwa sakte hain)
data = {
'name': ['Rajesh', 'Priya', 'Amit', 'Sneha', 'Vikram'],
'city': ['Delhi', 'Mumbai', 'Ahm