Data Structures: The Right Container for the Right Job

This is Post 5 in the Computer Science Series. The previous post explained how to measure algorithm efficiency. Now we look at data structures — the containers that store data in ways that make algorithms fast.

The same algorithm on the wrong data structure can be 1000× slower than on the right one. Choosing wisely is one of the most practical skills in software engineering.

Data Structures Overview

The Big Picture

╔══════════════════════════════════════════════════════════════════════════════════╗
║                    Data Structures: Trade-offs at a Glance                       ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║                                                                                  ║
║  Structure       │ Access │ Search │ Insert │ Delete │ Best for                  ║
║  ────────────────┼────────┼────────┼────────┼────────┼────────────────────────   ║
║  Array           │  O(1)  │  O(n)  │  O(n)  │  O(n)  │ index by position         ║
║  Linked List     │  O(n)  │  O(n)  │  O(1)* │  O(1)* │ frequent insert/delete    ║
║  Hash Table      │  O(1)* │  O(1)* │  O(1)* │  O(1)* │ key-value lookup          ║
║  Binary Search T.│  O(log n) for all, if balanced    │ sorted data, range query  ║
║  Heap            │  O(1)  │  O(n)  │  O(log n)│ O(log n)│ min/max quickly        ║
║  Graph           │  O(1)  │  O(V+E)│  O(1)  │  O(E)  │ relationships             ║
║                                                                                  ║
║  * amortized or average                                                          ║
╚══════════════════════════════════════════════════════════════════════════════════╝

1. Arrays — The Foundation

An array is a sequence of elements stored in contiguous memory — one right after another.

Index:  [0]  [1]  [2]  [3]  [4]
Value:  [10] [20] [30] [40] [50]
Memory: 1000 1004 1008 1012 1016  (each int = 4 bytes)

Because elements are contiguous, random access is O(1): arr[3] takes one calculation: base_address + 3 × element_size = 1000 + 12 = 1012. Done.

Strengths:

O(1) access by index
Cache-friendly — elements sit next to each other in memory
Simple and predictable

Weaknesses:

Fixed size (in low-level languages) or expensive to resize
Inserting/deleting in the middle: must shift all elements after → O(n)

Insert 15 at index 1:
Before: [10, 20, 30, 40, 50]
Shift:  [10, 20, 20, 30, 40, 50]  ← 20, 30, 40, 50 all moved right
Write:  [10, 15, 20, 30, 40, 50]  ← O(n) work

Dynamic Arrays (Python lists, Java ArrayLists)

Dynamic arrays solve the fixed-size problem. When full, they allocate a new array ~2× the size and copy everything over.

Array is full (size 4):  [1, 2, 3, 4]
Append 5 → allocate new array of size 8 → copy → add 5
New array: [1, 2, 3, 4, 5, _, _, _]

Copying takes O(n), but it happens rarely (only when doubling). Amortized over many appends, each append costs O(1).

2. Linked Lists — Flexible Chains

A linked list stores elements as nodes, where each node holds a value and a pointer to the next node.

[10|→] → [20|→] → [30|→] → [40|null]
  head

Nodes can be scattered anywhere in memory — they don’t have to be contiguous.

Strengths:

O(1) insert or delete if you have a pointer to the node (just update pointers)
No wasted space for empty slots

Weaknesses:

O(n) to access element at index k (must follow pointers from head)
Poor cache behaviour — nodes scattered in memory
Extra memory for pointers

Insert 15 between 10 and 20:
1. Create new node [15|→]
2. Point 15's next → 20
3. Point 10's next → 15
Done! O(1) — no shifting.

[10|→] → [15|→] → [20|→] → [30|→] → [40|null]

When to use: when you need fast inserts/deletes and don’t care about random access. Common in implementing queues, undo history, and music playlists.

3. Stacks and Queues

These are built on top of arrays or linked lists, with restricted operations.

Stack — Last In, First Out (LIFO)

Like a stack of plates: you can only add or remove from the top.

push(1) → [1]
push(2) → [1, 2]
push(3) → [1, 2, 3]
pop()   → 3   (top removed)
pop()   → 2

Operations: push (add to top), pop (remove from top), peek (look at top).

Used for:

Function call stack (when f() calls g(), g is pushed; when g returns, it’s popped)
Undo/redo in text editors
Parsing expressions (matching parentheses)
DFS graph traversal

Queue — First In, First Out (FIFO)

Like a checkout line: first person in line is first served.

enqueue(A) → [A]
enqueue(B) → [A, B]
enqueue(C) → [A, B, C]
dequeue()  → A (front removed)
dequeue()  → B

Operations: enqueue (add to back), dequeue (remove from front).

Used for:

BFS graph traversal
Print queues, task queues
Message queues between services (Kafka, RabbitMQ)

4. Hash Tables — O(1) Lookup Magic

A hash table (also called hash map or dictionary) stores key-value pairs with near-instant lookup.

phone_book = {
    "Alice": "555-1234",
    "Bob":   "555-5678",
    "Carol": "555-9012"
}
phone_book["Alice"]  # O(1) lookup — instant!

How It Works

Take the key (“Alice”)
Run it through a hash function → get a number (e.g., 42)
Store the value at index 42 in an array
To look up: hash the key again, go to that index

hash("Alice") → 42 → array[42] = "555-1234"
hash("Bob")   → 17 → array[17] = "555-5678"
hash("Carol") → 91 → array[91] = "555-9012"

lookup("Alice") → hash → 42 → return array[42]  ← one step!

Collisions

Two keys can hash to the same index — a collision. Solutions:

Chaining: each array slot holds a linked list of all keys that mapped there.

array[42] → [(Alice, 555-1234), (Dave, 555-0000)]  ← two keys collided at 42

Open addressing: if slot 42 is taken, try 43, 44, … until empty.

A good hash function spreads keys evenly to minimize collisions. In practice, lookup is O(1) average.

Real-World Use

Hash tables are everywhere:

Python dict and set
Database indexes on non-ordered columns
Caches (URL → cached page)
Counting word frequencies
Detecting duplicates

5. Trees — Hierarchical Structure

A tree is a hierarchical structure with a root node, where each node has zero or more children.

         A          ← root
        / \
       B   C        ← children of A
      / \   \
     D   E   F      ← leaves (no children)

Trees model: file systems (folders within folders), HTML (DOM), company org charts, and much more.

Binary Search Trees (BST)

A BST is a binary tree (each node has at most 2 children) with one rule: left child < node < right child.

        8
       / \
      3   10
     / \    \
    1   6    14
       / \
      4   7

This rule means search is O(log n) for a balanced tree:

def search(node, target):
    if node is None: return None
    if target == node.value: return node
    if target < node.value:  return search(node.left,  target)
    else:                    return search(node.right, target)

At each step, you eliminate half the tree. After log₂(n) steps, you find it or confirm it’s absent.

Problem: if you insert sorted data, the tree becomes a line:

1 → 2 → 3 → 4 → 5   (degenerate BST)

Now search is O(n), not O(log n). This is why self-balancing trees exist.

Balanced Trees (AVL, Red-Black)

Self-balancing trees automatically rotate nodes to maintain balance after inserts and deletes. The result: guaranteed O(log n) for search, insert, and delete.

Red-black trees are used in Java’s TreeMap, Python’s SortedList, and many database index structures.

6. Heaps — Find the Min/Max Fast

A heap is a tree where every parent is ≤ its children (min-heap) or ≥ its children (max-heap). The root is always the minimum (or maximum).

Min-heap:
        1
       / \
      3   2
     / \ / \
    7  4 5  6

Operations:

find_min(): O(1) — the root is always minimum
insert(x): O(log n) — add at bottom, bubble up
extract_min(): O(log n) — remove root, restore heap property

Used for:

Priority queues (process scheduling, Dijkstra’s algorithm)
Heap sort
Finding the k smallest/largest elements

import heapq
pq = []
heapq.heappush(pq, 3)
heapq.heappush(pq, 1)
heapq.heappush(pq, 2)
heapq.heappop(pq)   # returns 1 — always the minimum

7. Graphs — Relationships Between Things

A graph is the most general structure: a set of nodes (vertices) with connections (edges).

Undirected graph:         Directed graph (digraph):
  A --- B                   A → B
  |     |                   ↑   ↓
  C --- D                   C ← D

Graphs represent: roads (cities connected by roads), social networks (people connected by friendships), dependencies (task A must happen before B), the internet (routers connected by links).

Representing Graphs

Adjacency list: for each vertex, store a list of neighbors. Good for sparse graphs (few edges).

graph = {
    'A': ['B', 'C'],
    'B': ['A', 'D'],
    'C': ['A', 'D'],
    'D': ['B', 'C']
}

Adjacency matrix: a 2D array where matrix[i][j] = 1 means there’s an edge from i to j. Good for dense graphs (many edges), but O(V²) space.

     A  B  C  D
A  [ 0  1  1  0 ]
B  [ 1  0  0  1 ]
C  [ 1  0  0  1 ]
D  [ 0  1  1  0 ]

Graph algorithms (BFS, DFS, Dijkstra) were covered in the Algorithms post.

8. Choosing the Right Structure

Problem	Best structure
Access by integer index	Array
Lookup by key	Hash table
LIFO (undo stack)	Stack
FIFO (task queue, BFS)	Queue
Frequently insert/delete in middle	Linked list
Always need the minimum/maximum	Heap
Sorted data, range queries	Balanced BST
Model relationships	Graph
Hierarchical data	Tree

Real programs combine multiple structures. A web browser’s tab history uses a stack. Its network request queue uses a queue. Its DNS cache uses a hash table. Its rendering tree uses, well, a tree.

Summary

Data structures are not just CS theory — every line of code you write uses them. Python’s list is a dynamic array. dict is a hash table. set is a hash table with no values. heapq is a heap.

Knowing what’s underneath lets you:

Choose the right tool (O(1) lookup vs O(n))
Predict performance (why is this slow?)
Debug unexpected behaviour (why is my dict using so much memory?)

Need fast access by key?         → hash table
Need sorted order?               → balanced BST
Need fast min/max?               → heap
Need to model relationships?     → graph
Need cache-friendly iteration?   → array

In the next post, we’ll zoom out and look at Networks & Security — how computers talk to each other over the internet, and how we keep that communication safe.

Back to the series: Welcome to the Computer Science Series

The Big Picture

1. Arrays — The Foundation

Dynamic Arrays (Python lists, Java ArrayLists)

2. Linked Lists — Flexible Chains

3. Stacks and Queues

Stack — Last In, First Out (LIFO)

Queue — First In, First Out (FIFO)

4. Hash Tables — O(1) Lookup Magic

How It Works

Collisions

Real-World Use

5. Trees — Hierarchical Structure

Binary Search Trees (BST)

Balanced Trees (AVL, Red-Black)

6. Heaps — Find the Min/Max Fast

7. Graphs — Relationships Between Things

Representing Graphs

8. Choosing the Right Structure

Summary

Comments