Skip to main content

Data Visualization

visualization** — NOT on algorithms or web development.

🧠 Hinglish mein samjho: Python data analyst ka Swiss Army knife hai — data laana, saaf karna, analyze karna, graph banana — sab ek hi language mein. DecisionTree mein Pandas aur Seaborn roz use hote hain. DSA ya web development BILKUL nahi poochenge — sirf data manipulation aur analysis poochenge.


CHAPTER 1: PYTHON BASICS (Quick Review)

🧠 Agar Python bilkul nahi aati: Yeh chapter pehle 2-3 baar practice karo. Agar thodi bahut aati hai toh seedha Chapter 3 (Pandas) pe jaao — wahi interview mein poochha jaata hai.

1.1 Variables & Data Types

# Variables — no need to declare type (Python figures it out)
name = "DecisionTree" # str (text)
employees = 160 # int (whole number)
revenue = 45.5 # float (decimal)
is_profitable = True # bool (True/False)
cities = ["Delhi", "Mumbai"] # list (ordered, changeable)
info = {"name": "DT", "year": 2004} # dict (key-value pairs)
TypeExampleMutable?Ordered?
str"hello"
int42N/A
float3.14N/A
boolTrue/FalseN/A
list[1, 2, 3]
tuple(1, 2, 3)
set{1, 2, 3}
dict{"a": 1}✅ (Python 3.7+)

Mutable = bana ke baad change kar sakte ho (jaise list). Immutable = ek baar bana di toh change nahi hogi (jaise string, tuple). Interview mein zaroor poochte hain!

1.2 Lists — The Most Common Data Structure

fruits = ["apple", "banana", "cherry"]

# Accessing elements (0-indexed)
fruits[0] # "apple" (first)
fruits[-1] # "cherry" (last)
fruits[1:3] # ["banana", "cherry"] (slice: start inclusive, end exclusive)

# Modifying
fruits.append("date") # Add to end → ["apple", "banana", "cherry", "date"]
fruits.insert(1, "avocado") # Insert at position 1
fruits.remove("banana") # Remove by value
fruits.pop(0) # Remove by index, returns removed item

# Useful operations
len(fruits) # Number of items
sorted(fruits) # Returns new sorted list
fruits.sort() # Sorts in-place
"apple" in fruits # True/False — membership test

1.3 Dictionaries — Key-Value Lookup

customer = {
"name": "Rajesh",
"city": "Delhi",
"orders": 5
}

# Access
customer["name"] # "Rajesh"
customer.get("phone", "N/A") # "N/A" (safe access — no error if key missing)

# Modify
customer["orders"] = 6 # Update value
customer["email"] = "r@dt.com" # Add new key

# Loop through
for key, value in customer.items():
print(f"{key}: {value}")

1.4 Loops

# For loop — iterate over a sequence
for i in range(5): # 0, 1, 2, 3, 4
print(i)

for city in ["Delhi", "Mumbai", "Bangalore"]:
print(city)

# While loop — repeat until condition is false
count = 0
while count < 5:
print(count)
count += 1 # Don't forget to update!

# List comprehension — ek line mein loop + list banana (BAHUT use hota hai!)
# Socho: "Har x ka square nikalo, x = 1 se 5 tak"
squares = [x**2 for x in range(1, 6)] # [1, 4, 9, 16, 25]

# Condition ke saath — "sirf even numbers ka square do"
even_squares = [x**2 for x in range(1, 11) if x % 2 == 0] # [4, 16, 36, 64, 100]
# 🧠 Yaad rakho: [RESULT for ITEM in LIST if CONDITION]

1.5 Functions

def calculate_profit(revenue, cost):
"""Calculate profit and profit percentage."""
profit = revenue - cost
profit_pct = (profit / cost) * 100
return profit, profit_pct # Return multiple values as a tuple

# Call the function
p, pct = calculate_profit(10000, 8000)
print(f"Profit: ₹{p}, Profit%: {pct}%") # Profit: ₹2000, Profit%: 25.0%

Lambda Functions (One-line functions)

# Lambda = chhoti si bina-naam-ki function — ek line mein kaam ho jaaye
double = lambda x: x * 2 # "x lo, x*2 return karo"
double(5) # 10

# 🧠 Lambda Pandas mein BAHUT use hota hai:
# df['revenue'].apply(lambda x: x / 1000) → har value ko 1000 se divide karo
# Jab chhota kaam hai toh function banana bekar hai — lambda lagao

1.6 Exception Handling

try:
result = 10 / 0
except ZeroDivisionError:
print("Cannot divide by zero!")
except Exception as e:
print(f"Error: {e}")
finally:
print("This always runs")

CHAPTER 2: NumPy — Numerical Computing

2.1 What is NumPy?

NumPy (Numerical Python) is the foundation for numerical computing in Python. Pandas is built on top of it.

import numpy as np

# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
zeros = np.zeros(5) # [0, 0, 0, 0, 0]
ones = np.ones((3, 4)) # 3×4 matrix of 1s
sequence = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]

2.2 Why NumPy Over Lists?

FeaturePython ListNumPy Array
SpeedSlow10–100× faster
Element-wise operations❌ Need loops✅ Vectorized
MemoryMoreLess
Math operationsManualBuilt-in
# Element-wise operations (impossible with plain lists)
arr = np.array([10, 20, 30, 40, 50])
arr * 2 # [20, 40, 60, 80, 100]
arr + 5 # [15, 25, 35, 45, 55]
arr > 25 # [False, False, True, True, True]
arr[arr > 25] # [30, 40, 50] — boolean filtering

2.3 Essential NumPy Functions for Data Analysis

data = np.array([23, 45, 12, 67, 34, 56, 78, 89, 21, 43])

np.mean(data) # 46.8 — Average
np.median(data) # 44.0 — Middle value
np.std(data) # 23.3 — Standard deviation
np.var(data) # 543.0 — Variance
np.min(data) # 12
np.max(data) # 89
np.sum(data) # 468
np.percentile(data, 75) # 75th percentile (Q3)
np.sort(data) # Sorted array
np.unique(data) # Unique values

CHAPTER 3: PANDAS — The Heart of Data Analysis

🧠 YEH CHAPTER SABSE IMPORTANT HAI! Interview mein 70% Python questions Pandas se aate hain. Isko ache se padho aur practice karo.

3.1 What is Pandas?

Pandas ek Python library hai jo DataFrames deti hai — 2D tables (Excel sheets jaisi) — jinpe powerful data manipulation kar sakte ho.

🧠 Socho DataFrame ko Excel sheet samjho — rows hain, columns hain, filter kar sakte ho, sort kar sakte ho, formulas laga sakte ho. Bas Python mein likha jaata hai Excel clicks ki jagah.

import pandas as pd  # pd = chhota naam, hamesha aise hi import karte hain

# Dictionary se DataFrame banana (interview mein karwa sakte hain)
data = {
'name': ['Rajesh', 'Priya', 'Amit', 'Sneha', 'Vikram'],
'city': ['Delhi', 'Mumbai', 'Ahm