ds · reference

Pick the right structure

Walk the questions top-down. The first answer that matches your workload picks the structure. Real-world workloads usually combine structures — note the hybrids at the bottom.

Workload-first flowchart

  1. Do you need to look up values by an arbitrary key, with no order requirement?
    Hash table. O(1) average lookup. Mind the load factor and the DoS-resistance question (use SipHash-keyed implementations on untrusted input).
  2. Need ordered iteration, range queries (keys between A and B), or "min greater than X"?
    Balanced tree (AVL / Red-Black) for in-memory, or a B+-tree for on-disk. O(log n) guaranteed.
  3. Only ever need the minimum (or maximum) and want repeated O(log n) extracts?
    Heap / priority queue. Smaller constant factors than a balanced tree, no in-order traversal.
  4. Order is "newest in, newest out" (last-in-first-out)?
    Stack. Backed by an array. Used by every recursive algorithm, expression parser, and undo history.
  5. Order is "first in, first out" (oldest waits longest)?
    Queue (ring buffer for bounded; linked or VecDeque for unbounded). The backbone of BFS, message brokers, ROS topics.
  6. Data is fixed in size, dense, accessed randomly by index?
    Array. The fastest thing on a modern CPU. Sequential access is cache-prefetched and ~50–100× faster than equivalent linked structures.
  7. Need O(1) insert/delete in the middle and you'll already have a live reference to the affected node? (LRU cache, OS process scheduler.)
    Doubly linked list. Often paired with a hash table for O(1) lookup-by-key + O(1) splice.
  8. Modeling relationships, paths, dependencies, or anything network-shaped?
    Graph. Choose BFS for shortest path on unweighted, Dijkstra for non-negative weights, A* if you have a heuristic, Bellman-Ford if edges can be negative.

Cost reference — at a glance

Structure Access Search Insert Delete Ordered?
Array O(1) O(n) O(n) mid · O(1)* end O(n) mid by index
Linked list O(n) O(n) O(1)† at known node O(1)† at known node by insertion
Stack O(1) top O(1) push O(1) pop LIFO
Queue O(1) head O(1) enqueue O(1) dequeue FIFO
Heap O(1) peek O(log n) O(log n) root by priority
BST (avg) O(log n) O(log n) O(log n) O(log n) in-order
AVL / Red-Black O(log n) O(log n) O(log n) O(log n) in-order
Hash table O(1)* O(1)* O(1)* no
Graph (adj-list) BFS/DFS O(V+E) edge O(1) edge O(deg)

* amortized · requires a live reference to the affected node (the famous LRU-cache pattern combines a hash table + doubly-linked list to get this).

The hybrids you'll actually ship

LRU cache
Hash table → doubly-linked list. O(1) lookup + O(1) move-to-front on every hit. Every CPU cache, every database query plan cache, and every web browser tab uses this shape.
Priority queue with decrease-key
Heap + hash table (id → heap index). The hash table lets you find a node in the heap by external id in O(1), then percolate it up after lowering its priority. Dijkstra and A* assume this.
Union-find / disjoint set
Array-backed forest. Used by Kruskal's MST, image segmentation, network connectivity. O(α(n)) per operation — effectively constant.
Trie (prefix tree)
Tree where each edge is labeled with a character. Hash-table-of-children at each node. Powers autocomplete, IP routing tables, DNA pattern matching. Not in the core 10 lessons but built from BSTs and hash tables.
Skip list
Linked list with extra "express lanes" at each level. Probabilistic O(log n) lookup. Used by Redis sorted sets and LevelDB.
Bloom filter
Bit array + multiple hash functions. O(1) "is this key possibly in the set?" with a tunable false-positive rate. Used by CDN cache layers and biology pipelines.

Common picks people get wrong