Lessons · #10 of 11

Balanced Trees

Rotations keep height ≈ log n, no matter the insert order

AVL tree · self-balancing rotations

size0
avl height0
naïve bst h.0
init

Try it: Press insert 1..15 to force a degenerate sequence. The naïve BST height jumps to 15; the AVL height stays at 4 — because every imbalance triggers a rotation, flashed on the canvas.

Why it matters

Regular BSTs can degrade to O(n). Balanced trees guarantee O(log n) by automatically rebalancing after insertions and deletions. AVL and Red-Black trees are the workhorses behind most language standard libraries.

The idea

Remember how a BST can become a linked list if you insert sorted data? Balanced trees prevent this by restructuring after operations.

AVL Trees: After every insert/delete, check if any node's subtrees differ in height by more than 1. If so, rotate to fix it.

Rotations are the magic move:

The result: height is always O(log n), so all operations stay O(log n) guaranteed.

Red-Black Trees use a different trick: color nodes red or black with rules that limit how unbalanced the tree can get. Less strictly balanced than AVL, but fewer rotations on average.

Key takeaways

Invariants

These trees inherit the BST invariant and add a balance rule on top. The balance rule is what guarantees O(log n) height.

AVL invariant
For every node, |height(left) − height(right)| ≤ 1. Strictly enforced after every insert and delete via rotations.
Consequence: height h < 1.44 · log₂(n + 2) — within 44% of the theoretical minimum.
Red-Black invariants — all four must hold
1. Every node is colored either red or black.
2. The root is black. (Sometimes phrased: leaves — the NULL sentinels — are also black.)
3. A red node has only black children. (No two reds in a row along any root-to-leaf path.)
4. Every root-to-NULL path passes through the same number of black nodes. This is the black-height; it's the structural invariant that bounds the tree's height.
Consequence: height h ≤ 2 · log₂(n + 1). Looser than AVL but easier to maintain.

Going deeper

AVL vs Red-Black — which to pick

| Property | AVL | Red-Black | |---|---|---| | Worst-case height | ≤ 1.44 log₂ n | ≤ 2 log₂ n | | Rotations per insert | up to 2 | up to 2 | | Rotations per delete | up to O(log n) | up to 3 | | Lookup speed | slightly faster (more balanced) | slightly slower | | Insert/delete speed | slower | faster | | Memory per node | 1–2 bits (balance factor) | 1 bit (color) | | Typical use | databases, read-heavy | language stdlib std::map, TreeMap, Linux kernel rbtree.h |

Rule of thumb: if you read way more than you write, AVL. If your workload is mixed or write-heavy, Red-Black. C++'s std::map, Java's TreeMap, and the Linux kernel's process scheduler all use Red-Black for this reason.

B-trees — the disk-page generalization

A B-tree generalizes the BST in one direction: each node holds many keys (say 100 to 1000), not just one. A node has k + 1 children if it has k keys, and the children's key-ranges interleave with the parent's keys (left of key₀ → child₀, between key₀ and key₁ → child₁, …).

Why so many keys per node? Disk seeks. On a spinning disk, reading 1 byte and reading 16 KB take essentially the same time — the cost is the head moving, not the bytes transferred. Database engines fill each B-tree node to fit exactly one disk page (typically 4 KB or 16 KB). One disk read fetches ~256–1024 keys at once, so navigating a database of billions of rows takes only 3–4 page reads. The same logic applies to SSDs (where the seek cost is dominated by page-aligned I/O) and to filesystem inodes.

Every relational database index, every modern filesystem (ext4, NTFS, APFS, XFS), and most key-value stores (RocksDB, LevelDB) use B-trees or their close cousin the B+-tree (where leaf nodes form a linked list for fast range scans). B-trees are why SELECT * WHERE id BETWEEN 1000 AND 2000 is sub-millisecond on a billion-row table.

Real-world cost numbers

For a balanced tree of n = 1,000,000 keys:

Compare with an array sorted by key:

So if your dataset is read-only and fits in memory, a sorted array beats a balanced tree on both speed (cache locality) and memory. Balanced trees pay their overhead to keep O(log n) inserts.

Math details

AVL Balance Factor
BF(n) = height(n.left) - height(n.right)
Must be in 1
Time Complexity (guaranteed)
Search: O(log n)
Insert: O(log n)
Delete: O(log n)
Height bounds
AVL: h < 1.44 log_2(n+2)
Red-Black: h leq 2 log_2(n+1)

Implementation

Rotation Concept

// Right rotation when left subtree too tall
//       y                x
//      / \              / \
//     x   C   --->     A   y
//    / \                  / \
//   A   B                B   C

fn rotate_right(y: Node) -> Node {
    let x = y.left.take();
    y.left = x.right;
    x.right = Some(y);
    x
}

When to Use

Practice

The companion check to every self-balancing-tree implementation:

Browse all practice problems