practice · hard · from Heaps & Priority Queues

Merge K Sorted Lists

time: O(n log k)
space: O(k)
tags: heaps · linked-lists · k-way-merge

in the wild External merge sort, log-file consolidation, distributed query results — anywhere k sorted streams must merge into one sorted output.

Problem

You're given k linked lists, each already sorted in ascending order. Merge them all into one sorted linked list and return its head.

Examples

Input:  [1 → 4 → 5,
         1 → 3 → 4,
         2 → 6]
Output: 1 → 1 → 2 → 3 → 4 → 4 → 5 → 6

Input:  []                       Output:  (empty)
Input:  [[]]                     Output:  (empty)
Input:  [1, 2, 3]  (k = 1)       Output:  1 → 2 → 3

Why this is the canonical "k-way merge with a heap" problem

A two-list merge is O(n). Naïvely repeating it k − 1 times runs total O(n·k) — every pass re-scans the growing merged list.

The smarter approach: at every step, the next element of the output is the smallest head across all k lists. That smallest-of-k query is exactly what a min-heap does in O(log k). So:

Push the head of each non-empty list into a min-heap, keyed by node value. O(k log k).
Pop the smallest head → append to the output. O(log k).
If the popped node has a successor, push it. O(log k).
Repeat until the heap is empty.

Total: each of the n total nodes contributes one pop and at most one push → O(n log k). The heap never holds more than k items.

This is k-way merge in its purest form. The same algorithm powers external merge sort (k = number of sorted runs that fit in memory), distributed query merging (k = number of shards), and log-merging tools (k = number of input log files).

Hints

Hint 1 — what's the next output element?

At any step, the next element of the merged output is the smallest value among the current heads of the k lists. How do you find the min of k values in less than O(k)?

Hint 2 — heap of pointers, not of values

You don't push node values into the heap — you push the nodes themselves, comparing them by value. That way, after popping a node, you can advance to node.next and push it back in.

Hint 3 — alternative: divide-and-conquer merge

You can also merge pairwise — merge list 0 with list 1, list 2 with list 3, …, then merge the resulting k/2 lists pairwise, and so on. log k rounds, each O(n) → O(n log k). Same big-O as the heap, often slightly faster in practice (better cache behavior, no priority-queue overhead).

Solution

from __future__ import annotations
from dataclasses import dataclass
from typing import Optional
import heapq

@dataclass
class Node:
    val: int
    next: Optional["Node"] = None

def merge_k(lists: list[Optional[Node]]) -> Optional[Node]:
    # heapq compares tuples lexicographically; second slot is a tiebreaker
    # so we never compare two Node objects directly (they have no __lt__).
    heap: list[tuple[int, int, Node]] = []
    for i, head in enumerate(lists):
        if head:
            heapq.heappush(heap, (head.val, i, head))

    dummy = Node(0)
    tail  = dummy
    seq   = len(lists)                # monotonically increasing tiebreaker
    while heap:
        _, _, node = heapq.heappop(heap)
        tail.next  = node
        tail       = node
        if node.next:
            heapq.heappush(heap, (node.next.val, seq, node.next))
            seq += 1
    return dummy.next

# Helpers
def list_of(values: list[int]) -> Optional[Node]:
    dummy = Node(0); tail = dummy
    for v in values: tail.next = Node(v); tail = tail.next
    return dummy.next
def values(head: Optional[Node]) -> list[int]:
    out = []
    while head: out.append(head.val); head = head.next
    return out

cases = [
    ([[1, 4, 5], [1, 3, 4], [2, 6]], [1, 1, 2, 3, 4, 4, 5, 6]),
    ([],                              []),
    ([[]],                            []),
    ([[1, 2, 3]],                     [1, 2, 3]),
]
for lists, expected in cases:
    heads = [list_of(xs) for xs in lists]
    got   = values(merge_k(heads))
    mark  = "✓" if got == expected else "✗"
    print(f"{mark} merge_k({lists}) = {got}")

class Node {
  constructor(val, next = null) { this.val = val; this.next = next; }
}

// Min-heap of Node pointers, compared by node.val.
class MinHeap {
  constructor() { this.h = []; }
  size() { return this.h.length; }
  push(n) {
    this.h.push(n);
    let i = this.h.length - 1;
    while (i > 0) {
      const p = (i - 1) >> 1;
      if (this.h[p].val <= this.h[i].val) break;
      [this.h[p], this.h[i]] = [this.h[i], this.h[p]]; i = p;
    }
  }
  pop() {
    const top = this.h[0], last = this.h.pop();
    if (this.h.length === 0) return top;
    this.h[0] = last;
    const n = this.h.length;
    for (let i = 0;;) {
      let l = 2 * i + 1, r = 2 * i + 2, m = i;
      if (l < n && this.h[l].val < this.h[m].val) m = l;
      if (r < n && this.h[r].val < this.h[m].val) m = r;
      if (m === i) break;
      [this.h[m], this.h[i]] = [this.h[i], this.h[m]]; i = m;
    }
    return top;
  }
}

function mergeK(lists) {
  const h = new MinHeap();
  for (const head of lists) if (head) h.push(head);
  const dummy = new Node(0); let tail = dummy;
  while (h.size()) {
    const n = h.pop();
    tail.next = n; tail = n;
    if (n.next) h.push(n.next);
  }
  return dummy.next;
}

const listOf = vs => { const d = new Node(0); let t = d; for (const v of vs) { t.next = new Node(v); t = t.next; } return d.next; };
const values = h => { const r = []; while (h) { r.push(h.val); h = h.next; } return r; };

const cases = [
  [[[1, 4, 5], [1, 3, 4], [2, 6]], [1, 1, 2, 3, 4, 4, 5, 6]],
  [[],                              []],
  [[[]],                            []],
  [[[1, 2, 3]],                     [1, 2, 3]],
];
for (const [lists, expected] of cases) {
  const heads = lists.map(listOf);
  const got   = values(mergeK(heads));
  const ok    = JSON.stringify(got) === JSON.stringify(expected);
  console.log(`${ok ? "✓" : "✗"} mergeK(${JSON.stringify(lists)}) = ${JSON.stringify(got)}`);
}

// Rust's owned linked-list comes with the usual Box<Node> dance. The merge
// algorithm is identical to other languages; the borrow checker just makes
// ownership transfer explicit.
use std::cmp::Reverse;
use std::collections::BinaryHeap;

pub struct Node {
    pub val:  i32,
    pub next: Option<Box<Node>>,
}

pub fn merge_k(mut lists: Vec<Option<Box<Node>>>) -> Option<Box<Node>> {
    // Heap of (value, seq) → list-index. We pop the smallest value, take the
    // head off that list, and re-push its successor.
    let mut heap: BinaryHeap<Reverse<(i32, usize, usize)>> = BinaryHeap::new();
    let mut seq = 0usize;
    for (i, head) in lists.iter().enumerate() {
        if let Some(h) = head { heap.push(Reverse((h.val, seq, i))); seq += 1; }
    }

    let mut dummy = Box::new(Node { val: 0, next: None });
    let mut tail: &mut Box<Node> = &mut dummy;
    while let Some(Reverse((_, _, i))) = heap.pop() {
        let mut node = lists[i].take().unwrap();
        lists[i] = node.next.take();
        if let Some(ref nxt) = lists[i] {
            heap.push(Reverse((nxt.val, seq, i))); seq += 1;
        }
        tail.next = Some(node);
        tail      = tail.next.as_mut().unwrap();
    }
    dummy.next
}

fn list_of(vs: &[i32]) -> Option<Box<Node>> {
    let mut head: Option<Box<Node>> = None;
    for &v in vs.iter().rev() {
        head = Some(Box::new(Node { val: v, next: head }));
    }
    head
}
fn values(mut h: Option<Box<Node>>) -> Vec<i32> {
    let mut out = Vec::new();
    while let Some(n) = h { out.push(n.val); h = n.next; }
    out
}

fn main() {
    let cases: Vec<(Vec<Vec<i32>>, Vec<i32>)> = vec![
        (vec![vec![1, 4, 5], vec![1, 3, 4], vec![2, 6]], vec![1, 1, 2, 3, 4, 4, 5, 6]),
        (vec![],                                          vec![]),
        (vec![vec![]],                                    vec![]),
        (vec![vec![1, 2, 3]],                             vec![1, 2, 3]),
    ];
    for (lists, expected) in cases {
        let heads: Vec<_> = lists.iter().map(|xs| list_of(xs)).collect();
        let got = values(merge_k(heads));
        let mark = if got == expected { "✓" } else { "✗" };
        println!("{mark} merge_k(...) = {:?}", got);
    }
}

#include <iostream>
#include <queue>
#include <vector>

struct Node {
    int   val;
    Node* next;
    Node(int v, Node* n = nullptr) : val(v), next(n) {}
};

Node* merge_k(std::vector<Node*>& lists) {
    auto cmp = [](Node* a, Node* b) { return a->val > b->val; };   // min-heap
    std::priority_queue<Node*, std::vector<Node*>, decltype(cmp)> pq(cmp);
    for (Node* h : lists) if (h) pq.push(h);

    Node dummy(0);
    Node* tail = &dummy;
    while (!pq.empty()) {
        Node* n = pq.top(); pq.pop();
        tail->next = n;
        tail       = n;
        if (n->next) pq.push(n->next);
    }
    tail->next = nullptr;
    return dummy.next;
}

static Node* list_of(const std::vector<int>& xs) {
    Node dummy(0); Node* t = &dummy;
    for (int v : xs) { t->next = new Node(v); t = t->next; }
    return dummy.next;
}
static void show(Node* h) {
    std::cout << "[";
    while (h) { std::cout << h->val << (h->next ? "," : ""); h = h->next; }
    std::cout << "]\n";
}

int main() {
    std::vector<std::vector<int>> in_a = {{1, 4, 5}, {1, 3, 4}, {2, 6}};
    std::vector<Node*> a; for (auto& xs : in_a) a.push_back(list_of(xs));
    std::cout << "✓ merge_k({{1,4,5},{1,3,4},{2,6}}) = "; show(merge_k(a));

    std::vector<Node*> b;
    std::cout << "✓ merge_k({}) = "; show(merge_k(b));

    std::vector<Node*> c = { list_of({1, 2, 3}) };
    std::cout << "✓ merge_k({{1,2,3}}) = "; show(merge_k(c));
}

// Hand-rolled min-heap of Node* pointers, compared by ->val.
#include <stdio.h>
#include <stdlib.h>

typedef struct Node {
    int          val;
    struct Node* next;
} Node;

// Min-heap as a flat array.
static Node* heap[1024];
static int   hn = 0;

static void sift_up(int i) {
    while (i > 0) {
        int p = (i - 1) / 2;
        if (heap[p]->val <= heap[i]->val) break;
        Node* t = heap[p]; heap[p] = heap[i]; heap[i] = t;
        i = p;
    }
}
static void sift_down(int i) {
    while (1) {
        int l = 2*i+1, r = 2*i+2, m = i;
        if (l < hn && heap[l]->val < heap[m]->val) m = l;
        if (r < hn && heap[r]->val < heap[m]->val) m = r;
        if (m == i) break;
        Node* t = heap[m]; heap[m] = heap[i]; heap[i] = t;
        i = m;
    }
}
static void hpush(Node* n) { heap[hn++] = n; sift_up(hn - 1); }
static Node* hpop(void) {
    Node* top = heap[0];
    heap[0] = heap[--hn];
    if (hn > 0) sift_down(0);
    return top;
}

Node* merge_k(Node** lists, int k) {
    hn = 0;
    for (int i = 0; i < k; i++) if (lists[i]) hpush(lists[i]);
    Node  dummy = {0, NULL};
    Node* tail  = &dummy;
    while (hn > 0) {
        Node* n = hpop();
        tail->next = n; tail = n;
        if (n->next) hpush(n->next);
    }
    tail->next = NULL;
    return dummy.next;
}

static Node* mk(int v, Node* nxt) { Node* n = malloc(sizeof(Node)); n->val = v; n->next = nxt; return n; }
static Node* list_of(const int* xs, int n) {
    Node* head = NULL;
    for (int i = n - 1; i >= 0; i--) head = mk(xs[i], head);
    return head;
}
static void show(Node* h) {
    printf("[");
    while (h) { printf("%d%s", h->val, h->next ? "," : ""); h = h->next; }
    printf("]\n");
}

int main(void) {
    int a1[] = {1, 4, 5}, a2[] = {1, 3, 4}, a3[] = {2, 6};
    Node* a[] = {list_of(a1, 3), list_of(a2, 3), list_of(a3, 2)};
    printf("✓ merge_k({{1,4,5},{1,3,4},{2,6}}) = "); show(merge_k(a, 3));

    Node* b[] = {NULL};                  // k = 0 equivalent
    printf("✓ merge_k({}) = "); show(merge_k(b, 0));

    int c1[] = {1, 2, 3};
    Node* c[] = {list_of(c1, 3)};
    printf("✓ merge_k({{1,2,3}}) = "); show(merge_k(c, 1));
    return 0;
}

idle

The trick that makes Rust hard here is that you can't borrow two list heads at once for comparison. The standard workaround: drive everything through indices into a Vec, so the heap holds (value, list_index) and the actual mutation happens via the index.

Complexity

Time: O(n log k) — total n nodes; each contributes one heap pop + at most one heap push, both O(log k).
Space: O(k) — heap holds at most one node per list. The output list itself is allocated separately and isn't counted in "auxiliary space".
Divide-and-conquer alternative: Same O(n log k), often faster in practice (no priority-queue overhead, better cache behavior).

In the wild

External merge sort. Sort 100 GB on a 16 GB machine: produce many sorted runs that fit in memory, then k-way merge them. This algorithm is the merge step.
Distributed sorted queries. Each shard returns its own sorted result; the coordinator merges them via k-way merge before returning to the client.
Log file aggregation. sort -m file1 file2 ... and tools like logrotate + tail -F use this.
The merge stage of LSM-tree compaction (Cassandra, RocksDB, LevelDB). Each level's sorted files are merged into the next level.

Variations

Merge K sorted arrays (not linked lists). Same algorithm; heap holds (value, array index, position-in-array) triples.
Merge K sorted iterators (the streaming / lazy version). Identical pattern; never materialize input or output in full.
K closest elements. Push the input around a target, but emit only the closest K — combine with a max-heap cap.
Kth smallest in K sorted lists. Variant of kth-smallest-bst: start the k-way merge, count to k, stop.