Balanced Trees
Rotations keep height ≈ log n, no matter the insert order
AVL tree · self-balancing rotations
init
Try it: Press insert 1..15 to force a degenerate sequence. The naïve BST height jumps to 15; the AVL height stays at 4 — because every imbalance triggers a rotation, flashed on the canvas.
Why it matters
Regular BSTs can degrade to O(n). Balanced trees guarantee O(log n) by automatically rebalancing after insertions and deletions. AVL and Red-Black trees are the workhorses behind most language standard libraries.
The idea
Remember how a BST can become a linked list if you insert sorted data? Balanced trees prevent this by restructuring after operations.
AVL Trees: After every insert/delete, check if any node's subtrees differ in height by more than 1. If so, rotate to fix it.
Rotations are the magic move:
- Right rotation: When left subtree is too tall
- Left rotation: When right subtree is too tall
- Double rotations: For zig-zag cases
The result: height is always O(log n), so all operations stay O(log n) guaranteed.
Red-Black Trees use a different trick: color nodes red or black with rules that limit how unbalanced the tree can get. Less strictly balanced than AVL, but fewer rotations on average.
Key takeaways
- Guarantee O(log n) operations regardless of insertion order
- Rotations are O(1) operations that preserve BST property
- AVL: stricter balance (height difference ≤ 1)
- Red-Black: looser balance, fewer rotations
Invariants
These trees inherit the BST invariant and add a balance rule on top. The balance rule is what guarantees O(log n) height.
- AVL invariant
- For every node,
|height(left) − height(right)| ≤ 1. Strictly enforced after every insert and delete via rotations. - Consequence: height
h < 1.44 · log₂(n + 2)— within 44% of the theoretical minimum. - Red-Black invariants — all four must hold
- 1. Every node is colored either red or black.
- 2. The root is black. (Sometimes phrased: leaves — the NULL sentinels — are also black.)
- 3. A red node has only black children. (No two reds in a row along any root-to-leaf path.)
- 4. Every root-to-NULL path passes through the same number of black nodes. This is the black-height; it's the structural invariant that bounds the tree's height.
- Consequence: height
h ≤ 2 · log₂(n + 1). Looser than AVL but easier to maintain.
Going deeper
AVL vs Red-Black — which to pick
| Property | AVL | Red-Black |
|---|---|---|
| Worst-case height | ≤ 1.44 log₂ n | ≤ 2 log₂ n |
| Rotations per insert | up to 2 | up to 2 |
| Rotations per delete | up to O(log n) | up to 3 |
| Lookup speed | slightly faster (more balanced) | slightly slower |
| Insert/delete speed | slower | faster |
| Memory per node | 1–2 bits (balance factor) | 1 bit (color) |
| Typical use | databases, read-heavy | language stdlib std::map, TreeMap, Linux kernel rbtree.h |
Rule of thumb: if you read way more than you write, AVL. If your workload is mixed or write-heavy, Red-Black. C++'s std::map, Java's TreeMap, and the Linux kernel's process scheduler all use Red-Black for this reason.
B-trees — the disk-page generalization
A B-tree generalizes the BST in one direction: each node holds many keys (say 100 to 1000), not just one. A node has k + 1 children if it has k keys, and the children's key-ranges interleave with the parent's keys (left of key₀ → child₀, between key₀ and key₁ → child₁, …).
Why so many keys per node? Disk seeks. On a spinning disk, reading 1 byte and reading 16 KB take essentially the same time — the cost is the head moving, not the bytes transferred. Database engines fill each B-tree node to fit exactly one disk page (typically 4 KB or 16 KB). One disk read fetches ~256–1024 keys at once, so navigating a database of billions of rows takes only 3–4 page reads. The same logic applies to SSDs (where the seek cost is dominated by page-aligned I/O) and to filesystem inodes.
Every relational database index, every modern filesystem (ext4, NTFS, APFS, XFS), and most key-value stores (RocksDB, LevelDB) use B-trees or their close cousin the B+-tree (where leaf nodes form a linked list for fast range scans). B-trees are why SELECT * WHERE id BETWEEN 1000 AND 2000 is sub-millisecond on a billion-row table.
Real-world cost numbers
For a balanced tree of n = 1,000,000 keys:
- Lookup: ~20 comparisons (log₂ 1M ≈ 20)
- Memory: ~24 MB (assuming 24 B per node — 3 pointers + 4-byte key)
Compare with an array sorted by key:
- Lookup: ~20 comparisons (binary search, same log₂ n)
- Memory: ~4 MB (just the keys, no pointers)
- Insert:
O(n)— array shift
So if your dataset is read-only and fits in memory, a sorted array beats a balanced tree on both speed (cache locality) and memory. Balanced trees pay their overhead to keep O(log n) inserts.
Math details
- AVL Balance Factor
BF(n) = height(n.left) - height(n.right)Must be in 1- Time Complexity (guaranteed)
Search: O(log n)Insert: O(log n)Delete: O(log n)- Height bounds
AVL: h < 1.44 log_2(n+2)Red-Black: h leq 2 log_2(n+1)
Implementation
Rotation Concept
// Right rotation when left subtree too tall
// y x
// / \ / \
// x C ---> A y
// / \ / \
// A B B C
fn rotate_right(y: Node) -> Node {
let x = y.left.take();
y.left = x.right;
x.right = Some(y);
x
}
When to Use
- Need guaranteed O(log n) performance
- Data arrives in sorted/nearly-sorted order
- Implementing maps, sets, priority queues
- Database indexes
Practice
The companion check to every self-balancing-tree implementation:
- Is Binary Tree Height-Balanced? — easy · the O(n) post-order pass with sentinel-based early-exit; what you assert after every AVL / Red-Black rotation