Algorithms: Thinking About Efficiency
This is Post 4 in the Computer Science Series. The previous post covered operating systems. Now we get to the heart of computer science: algorithms.
An algorithm is a step-by-step recipe for solving a problem. The question is never just “does it work?” — it’s “how fast does it work, and does it get slow as the problem gets bigger?”
A bad algorithm can take thousands of years on a fast computer. A good algorithm can solve the same problem in milliseconds.
The Big Picture
╔════════════════════════════════════════════════════════════════════════╗
║ Algorithm Complexity ║
╠════════════════════════════════════════════════════════════════════════╣
║ ║
║ n = input size. How many operations does the algorithm need? ║
║ ║
║ O(1) constant — same speed regardless of input size ║
║ O(log n) logarithmic — doubles input → 1 more step ║
║ O(n) linear — doubles input → 2x more steps ║
║ O(n log n) linearithmic— sorting (merge sort, heap sort) ║
║ O(n²) quadratic — nested loops, bubble sort ║
║ O(2ⁿ) exponential — doubles with every 1-element increase ║
║ ║
║ For n = 1,000,000: ║
║ O(1) → 1 op ║
║ O(log n) → 20 ops ║
║ O(n) → 1,000,000 ops ║
║ O(n log n) → 20,000,000 ops ║
║ O(n²) → 1,000,000,000,000 ops ← takes ~16 minutes at 1GHz ║
║ O(2ⁿ) → way more than atoms in the observable universe ║
╠════════════════════════════════════════════════════════════════════════╣
║ Key Algorithm Families ║
║ ║
║ Sorting: merge sort O(n log n), quicksort O(n log n) avg ║
║ Searching: linear O(n), binary search O(log n) ║
║ Graphs: BFS/DFS O(V+E), Dijkstra O(E log V) ║
║ DP: avoid recomputing → exponential → polynomial ║
║ Greedy: local best choice → often global best (not always!) ║
╚════════════════════════════════════════════════════════════════════════╝
1. Big O Notation — Measuring Speed
Before comparing algorithms, we need a language. Big O notation describes how an algorithm’s running time grows as the input size grows.
We ignore constants and small terms — we care about the shape of growth, not the exact number.
f(n) = 3n² + 10n + 5 → O(n²) (quadratic dominates)
f(n) = 100n → O(n) (ignore the constant 100)
f(n) = log n + 1000 → O(log n)
Intuition for each complexity class:
O(1) — constant time. Looking up a value by index in an array. Doesn’t matter if the array has 10 or 10 million elements.
O(log n) — logarithmic. Each step cuts the problem in half. Finding a word in a dictionary by opening to the middle, then the middle of one half, etc.
O(n) — linear. Reading every element once. Finding a specific page in a book by reading page by page.
O(n log n) — most efficient sorting algorithms. Unavoidable for comparison-based sorting.
O(n²) — quadratic. Comparing every element to every other element. Slow for large inputs.
O(2ⁿ) — exponential. The algorithm’s work doubles for every additional element. Fine for n=20, impossible for n=60.
2. Searching — Finding a Needle
Linear Search — O(n)
The simplest approach: check every element until you find it.
def linear_search(arr, target):
for i, value in enumerate(arr):
if value == target:
return i
return -1 # not found
In the worst case (target is last or not present), you check every element. For a million elements: a million checks.
Binary Search — O(log n)
Works on sorted data. Each step cuts the search space in half.
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1 # target is in the right half
else:
right = mid - 1 # target is in the left half
return -1
For a million elements: at most 20 checks (log₂(1,000,000) ≈ 20). This is the power of O(log n).
Dictionary analogy: to find “python” in a dictionary, you open to the middle (M), see M comes before P, so you look in the second half. Open to that midpoint (S), P comes before S, look in the first half. You find P in a few steps — not by reading every word.
3. Sorting — Putting Things in Order
Sorting is one of the most-studied problems in CS. Many operations become much faster on sorted data (binary search, deduplication, merging).
Bubble Sort — O(n²) — The Slow One
Compare adjacent pairs. Swap if out of order. Repeat until nothing swaps.
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(n - i - 1):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
Pass 1: [5, 3, 8, 1] → [3, 5, 1, 8] (8 "bubbled" to end)
Pass 2: [3, 5, 1, 8] → [3, 1, 5, 8]
Pass 3: [3, 1, 5, 8] → [1, 3, 5, 8]
Simple to understand, terrible performance. Never use for large datasets.
Merge Sort — O(n log n) — Divide and Conquer
Split the array in half, sort each half recursively, merge the two sorted halves.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i]); i += 1
else:
result.append(right[j]); j += 1
return result + left[i:] + right[j:]
[5, 3, 8, 1]
↓ split
[5, 3] [8, 1]
↓ ↓
[3,5] [1,8]
↓ merge
[1, 3, 5, 8]
Each merge step takes O(n). There are O(log n) levels of splitting. Total: O(n log n).
Quicksort — O(n log n) average
Pick a “pivot” element. Partition: all smaller elements go left, larger go right. Recursively sort each side.
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
Quicksort is typically faster in practice than merge sort (better cache behaviour), even though both are O(n log n). Python’s built-in sorted() uses Timsort — a clever hybrid of merge sort and insertion sort.
4. Recursion — Functions That Call Themselves
Many algorithms are naturally recursive: the algorithm for a problem of size n calls itself on a smaller problem.
def factorial(n):
if n == 0:
return 1 # base case — stops the recursion
return n * factorial(n - 1) # recursive case
factorial(5) = 5 × factorial(4)
= 5 × 4 × factorial(3)
= 5 × 4 × 3 × factorial(2)
= 5 × 4 × 3 × 2 × factorial(1)
= 5 × 4 × 3 × 2 × 1 × factorial(0)
= 5 × 4 × 3 × 2 × 1 × 1
= 120
Every recursive function needs:
- A base case that stops the recursion
- A recursive case that makes progress toward the base case
Without a base case, you get infinite recursion — a stack overflow.
5. Dynamic Programming — Remember Your Work
Some problems have overlapping subproblems — you compute the same thing over and over.
Example: Fibonacci numbers (each number = sum of the previous two)
# Naive recursive — O(2ⁿ) — exponential!
def fib(n):
if n <= 1: return n
return fib(n - 1) + fib(n - 2)
# fib(5) calls fib(3) twice, fib(2) three times, fib(1) five times...
The problem: fib(3) is computed over and over. For fib(50), the naive version makes 2^50 calls — more than a trillion.
Dynamic programming fixes this by caching results:
# Memoized — O(n) — much better!
memo = {}
def fib(n):
if n <= 1: return n
if n in memo: return memo[n] # already computed!
memo[n] = fib(n - 1) + fib(n - 2)
return memo[n]
# Or bottom-up (even simpler):
def fib(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
DP turns exponential problems into polynomial ones. It’s used everywhere:
- Spell checkers (edit distance)
- GPS navigation (shortest path)
- DNA sequence alignment
- Game AI (optimal move sequences)
6. Greedy Algorithms — Always Take the Best Step
A greedy algorithm makes the locally best choice at each step, hoping this leads to a globally best solution.
Example: Making change with the fewest coins
With coins [25¢, 10¢, 5¢, 1¢], what’s the minimum coins to make 41¢?
Greedy: always use the largest coin that fits
41¢ → use 25¢ → 16¢ remaining
16¢ → use 10¢ → 6¢ remaining
6¢ → use 5¢ → 1¢ remaining
1¢ → use 1¢ → done
Result: 4 coins [25, 10, 5, 1] ✓ optimal!
Greedy works here. But it doesn’t always work. With coins [4¢, 3¢, 1¢], to make 6¢:
Greedy: 4¢ + 1¢ + 1¢ = 3 coins
Optimal: 3¢ + 3¢ = 2 coins ← greedy failed!
Greedy algorithms are fast (usually O(n log n)) but only correct when the problem has the greedy choice property — local best choices always lead to a global best.
7. Graph Algorithms
Graphs model relationships: cities connected by roads, users connected by friendships, tasks connected by dependencies.
A graph has vertices (nodes) and edges (connections).
A ── B
| |
C ── D
Vertices: {A, B, C, D}
Edges: {A-B, A-C, B-D, C-D}
Breadth-First Search (BFS) — O(V + E)
Explore all neighbors at the current distance before going further. Finds the shortest path (in hops) between two nodes.
from collections import deque
def bfs(graph, start, target):
queue = deque([[start]])
visited = {start}
while queue:
path = queue.popleft()
node = path[-1]
if node == target:
return path # found it!
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(path + [neighbor])
Analogy: ripples expanding from a stone dropped in water — you explore all nodes 1 step away, then 2 steps, then 3.
Depth-First Search (DFS) — O(V + E)
Go as deep as possible before backtracking. Used for maze solving, cycle detection, topological sorting.
def dfs(graph, node, visited=None):
if visited is None: visited = set()
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
return visited
Dijkstra’s Algorithm — O(E log V)
Finds the shortest weighted path between nodes (where edges have costs, like road distances).
Graph with weights:
A →3→ B →2→ D
A →2→ C →4→ D
Shortest A→D: A→B→D = 3+2 = 5
A→C→D = 2+4 = 6
Dijkstra picks: A→B→D (cost 5)
Dijkstra’s algorithm is used in GPS navigation, network routing (OSPF), and game pathfinding (A*).
8. Correctness — Proving an Algorithm Works
An algorithm isn’t just a guess. We need to prove it’s correct.
Loop invariant: a property that’s true before every loop iteration.
For binary search:
- Invariant: “the target is within
arr[left..right]if it exists” - Initialization: true at start (whole array)
- Maintenance: true after each step (we correctly cut in half)
- Termination: when
left > right, the target is not in the array
Proving loop invariants shows the algorithm is correct — not just “it seems to work on these test cases”.
Summary
Problem size n → which algorithm fits?
n ≤ 20: almost anything works (even exponential)
n ≤ 1000: O(n²) is fine
n ≤ 10⁶: need O(n) or O(n log n)
n ≤ 10⁹: need O(log n) or O(1)
Choosing the right algorithm is often the most important decision you make. A well-chosen algorithm on slow hardware beats a bad algorithm on the fastest hardware.
Key takeaways:
- Binary search beats linear search if data is sorted
- Merge/quicksort is the go-to for sorting
- Dynamic programming when you see overlapping subproblems
- Greedy when local best choices lead to global best
- BFS/DFS for exploring graphs; Dijkstra for shortest weighted paths
In the next post, we’ll look at Data Structures — the containers that store data in ways that make algorithms fast.
Back to the series: Welcome to the Computer Science Series
Comments