ds · reference
Pick the right structure
Walk the questions top-down. The first answer that matches your workload picks the structure. Real-world workloads usually combine structures — note the hybrids at the bottom.
Workload-first flowchart
- Do you need to look up values by an arbitrary key, with no order requirement?→ Hash table.
O(1)average lookup. Mind the load factor and the DoS-resistance question (use SipHash-keyed implementations on untrusted input). - Need ordered iteration, range queries (
keys between A and B), or "min greater than X"?→ Balanced tree (AVL / Red-Black) for in-memory, or a B+-tree for on-disk.O(log n)guaranteed. - Only ever need the minimum (or maximum) and want repeated
O(log n)extracts?→ Heap / priority queue. Smaller constant factors than a balanced tree, no in-order traversal. - Order is "newest in, newest out" (last-in-first-out)?→ Stack. Backed by an array. Used by every recursive algorithm, expression parser, and undo history.
- Order is "first in, first out" (oldest waits longest)?→ Queue (ring buffer for bounded; linked or
VecDequefor unbounded). The backbone of BFS, message brokers, ROS topics. - Data is fixed in size, dense, accessed randomly by index?→ Array. The fastest thing on a modern CPU. Sequential access is cache-prefetched and ~50–100× faster than equivalent linked structures.
- Need
O(1)insert/delete in the middle and you'll already have a live reference to the affected node? (LRU cache, OS process scheduler.) - Modeling relationships, paths, dependencies, or anything network-shaped?→ Graph. Choose BFS for shortest path on unweighted, Dijkstra for non-negative weights, A* if you have a heuristic, Bellman-Ford if edges can be negative.
Cost reference — at a glance
| Structure | Access | Search | Insert | Delete | Ordered? |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) mid · O(1)* end | O(n) mid | by index |
| Linked list | O(n) | O(n) | O(1)† at known node | O(1)† at known node | by insertion |
| Stack | O(1) top | — | O(1) push | O(1) pop | LIFO |
| Queue | O(1) head | — | O(1) enqueue | O(1) dequeue | FIFO |
| Heap | O(1) peek | — | O(log n) | O(log n) root | by priority |
| BST (avg) | O(log n) | O(log n) | O(log n) | O(log n) | in-order |
| AVL / Red-Black | O(log n) | O(log n) | O(log n) | O(log n) | in-order |
| Hash table | — | O(1)* | O(1)* | O(1)* | no |
| Graph (adj-list) | — | BFS/DFS O(V+E) | edge O(1) | edge O(deg) | — |
* amortized · † requires a live reference
to the affected node (the famous LRU-cache pattern combines a hash
table + doubly-linked list to get this).
The hybrids you'll actually ship
- LRU cache
- Hash table → doubly-linked list.
O(1)lookup +O(1)move-to-front on every hit. Every CPU cache, every database query plan cache, and every web browser tab uses this shape. - Priority queue with decrease-key
- Heap + hash table (id → heap index). The hash table lets you find a node in the heap by external id in
O(1), then percolate it up after lowering its priority. Dijkstra and A* assume this. - Union-find / disjoint set
- Array-backed forest. Used by Kruskal's MST, image segmentation, network connectivity.
O(α(n))per operation — effectively constant. - Trie (prefix tree)
- Tree where each edge is labeled with a character. Hash-table-of-children at each node. Powers autocomplete, IP routing tables, DNA pattern matching. Not in the core 10 lessons but built from BSTs and hash tables.
- Skip list
- Linked list with extra "express lanes" at each level. Probabilistic
O(log n)lookup. Used by Redis sorted sets and LevelDB. - Bloom filter
- Bit array + multiple hash functions.
O(1)"is this key possibly in the set?" with a tunable false-positive rate. Used by CDN cache layers and biology pipelines.
Common picks people get wrong
- Linked list when an array would be 50× faster. If you're iterating sequentially or doing index access, the cache locality of an array dominates. Reach for a linked list only when you genuinely need
O(1)splice with a live reference. - BST when you really wanted a heap. If you only ever ask "what's the smallest?", a heap is smaller, faster, and simpler.
- Hash table when you need range queries. Hash tables have no order. "Show me keys between 100 and 200" is
O(n)in a hash table,O(log n + k)in a balanced tree. - Naive recursion when you should iterate with an explicit stack. Deep recursion overflows the call stack. Tree/graph traversals that might go more than a few thousand levels deep should keep their own stack array.
- Adjacency matrix for a sparse graph. An
n=10,000matrix is100Mentries. Use an adjacency list unless the graph is genuinely dense. - Dijkstra on negative weights. Quietly wrong, doesn't crash. Use Bellman-Ford.