Top K Frequent Elements
in the wild Top-K trending topics, hot URL leaderboard, top-K product views — anywhere a streaming or batch system needs the heaviest hitters.
Problem
Given an integer array nums and an integer k, return the k most frequent elements. Order of the output doesn't matter. You may assume k is valid (1 ≤ k ≤ number of distinct elements).
Examples
nums = [1, 1, 1, 2, 2, 3], k = 2→[1, 2]nums = [1], k = 1→[1]nums = [4, 1, -1, 2, -1, 2, 3], k = 2→[-1, 2]
Why this is the canonical "top-K with a heap" problem
Sorting by frequency is O(n log n). The win: you only care about the top K, so you never need the rest in order. Maintain a min-heap of size K:
- Count frequencies →
O(n)with a hash table. - Walk the (key, count) pairs. Push each into a min-heap. If the heap size exceeds
K, pop the minimum. - End state: the heap holds the K largest by frequency. Pop them out.
Each push/pop is O(log K) (heap size capped at K), and there are at most n distinct keys → O(n log K). When K ≪ n (e.g. "top 10 of a billion") the savings are enormous.
The same pattern solves "K largest in a stream", "top K shortest distances", "K closest points to origin", and the "K" in any "K-something" problem where you want a rolling shortlist without sorting the whole input.
Hints
Hint 1 — count first
You can't compute "top K by frequency" without knowing each element's frequency. A hash table maps element → count in one O(n) pass.
Hint 2 — min-heap, not max-heap
Counterintuitively, a min-heap is the right tool for "top K largest". Keep the heap at exactly K elements; the smallest one (= the heap's root) is the threshold. When a new candidate's value is larger than the root, evict the root.
Hint 3 — alternative: bucket sort
If you don't mind the constant factor, you can do this in O(n) time. Create buckets indexed by frequency (since frequency is ≤ n). Walk the buckets from the top downward, collecting until you have K items. Useful when K ≈ n.
Solution
from collections import Counter
import heapq
def top_k_frequent(nums: list[int], k: int) -> list[int]:
counts = Counter(nums)
# heapq is a min-heap. We push (count, key) tuples and cap size at k.
heap: list[tuple[int, int]] = []
for key, cnt in counts.items():
heapq.heappush(heap, (cnt, key))
if len(heap) > k:
heapq.heappop(heap) # evict the current smallest
return [key for cnt, key in heap]
cases = [
([1, 1, 1, 2, 2, 3], 2, {1, 2}),
([1], 1, {1}),
([4, 1, -1, 2, -1, 2, 3], 2, {-1, 2}),
]
for nums, k, expected in cases:
got = set(top_k_frequent(nums, k))
mark = "✓" if got == expected else "✗"
print(f"{mark} top_k_frequent({nums}, k={k}) = {sorted(got)} (expected {sorted(expected)})")// A minimal binary min-heap of (count, key) pairs.
class MinHeap {
constructor() { this.h = []; }
size() { return this.h.length; }
peek() { return this.h[0]; }
push(v) {
this.h.push(v);
for (let i = this.h.length - 1; i > 0; ) {
const p = (i - 1) >> 1;
if (this.h[p][0] <= this.h[i][0]) break;
[this.h[p], this.h[i]] = [this.h[i], this.h[p]];
i = p;
}
}
pop() {
const top = this.h[0], last = this.h.pop();
if (this.h.length === 0) return top;
this.h[0] = last;
const n = this.h.length;
for (let i = 0;;) {
let l = 2 * i + 1, r = 2 * i + 2, m = i;
if (l < n && this.h[l][0] < this.h[m][0]) m = l;
if (r < n && this.h[r][0] < this.h[m][0]) m = r;
if (m === i) break;
[this.h[m], this.h[i]] = [this.h[i], this.h[m]];
i = m;
}
return top;
}
}
function topKFrequent(nums, k) {
const counts = new Map();
for (const x of nums) counts.set(x, (counts.get(x) ?? 0) + 1);
const h = new MinHeap();
for (const [key, cnt] of counts) {
h.push([cnt, key]);
if (h.size() > k) h.pop();
}
return h.h.map(([_, key]) => key);
}
const cases = [
[[1, 1, 1, 2, 2, 3], 2, [1, 2]],
[[1], 1, [1]],
[[4, 1, -1, 2, -1, 2, 3], 2, [-1, 2]],
];
for (const [nums, k, expected] of cases) {
const got = topKFrequent(nums, k).sort((a, b) => a - b);
const ok = JSON.stringify(got) === JSON.stringify(expected.slice().sort((a, b) => a - b));
console.log(`${ok ? "✓" : "✗"} topKFrequent(${JSON.stringify(nums)}, k=${k}) = ${JSON.stringify(got)}`);
}use std::cmp::Reverse;
use std::collections::{BinaryHeap, HashMap};
pub fn top_k_frequent(nums: &[i32], k: usize) -> Vec<i32> {
// Step 1 — count.
let mut counts: HashMap<i32, usize> = HashMap::new();
for &x in nums { *counts.entry(x).or_insert(0) += 1; }
// BinaryHeap in Rust is a MAX-heap. Wrap counts in `Reverse` to flip it
// into a min-heap of (count, key) pairs.
let mut heap: BinaryHeap<Reverse<(usize, i32)>> = BinaryHeap::new();
for (&key, &cnt) in &counts {
heap.push(Reverse((cnt, key)));
if heap.len() > k { heap.pop(); }
}
heap.into_iter().map(|Reverse((_, k))| k).collect()
}
fn main() {
let cases: &[(&[i32], usize, &[i32])] = &[
(&[1, 1, 1, 2, 2, 3], 2, &[1, 2]),
(&[1], 1, &[1]),
(&[4, 1, -1, 2, -1, 2, 3], 2, &[-1, 2]),
];
for (nums, k, expected) in cases {
let mut got = top_k_frequent(nums, *k);
got.sort();
let mut exp = expected.to_vec();
exp.sort();
let mark = if got == exp { "✓" } else { "✗" };
println!("{mark} top_k_frequent({:?}, k={}) = {:?}", nums, k, got);
}
}#include <iostream>
#include <queue>
#include <unordered_map>
#include <vector>
std::vector<int> top_k_frequent(const std::vector<int>& nums, int k) {
std::unordered_map<int, int> counts;
for (int x : nums) ++counts[x];
// Min-heap of (count, key) — std::greater inverts the default max-heap.
using P = std::pair<int, int>;
std::priority_queue<P, std::vector<P>, std::greater<P>> heap;
for (auto& [key, cnt] : counts) {
heap.push({cnt, key});
if ((int)heap.size() > k) heap.pop();
}
std::vector<int> out;
while (!heap.empty()) { out.push_back(heap.top().second); heap.pop(); }
return out;
}
int main() {
struct Case { std::vector<int> nums; int k; std::vector<int> expected; };
std::vector<Case> cases = {
{ {1, 1, 1, 2, 2, 3}, 2, {1, 2} },
{ {1}, 1, {1} },
{ {4, 1, -1, 2, -1, 2, 3}, 2, {-1, 2}},
};
for (auto& c : cases) {
auto got = top_k_frequent(c.nums, c.k);
std::sort(got.begin(), got.end());
std::sort(c.expected.begin(), c.expected.end());
std::cout << (got == c.expected ? "✓" : "✗") << " top_k_frequent(.., k=" << c.k << ") = {";
for (size_t i = 0; i < got.size(); ++i)
std::cout << got[i] << (i + 1 < got.size() ? "," : "");
std::cout << "}\n";
}
}// For C, the "build a heap by hand" tax is high enough that the bucket-sort
// O(n) variant is the clean choice. We count, then walk frequency buckets
// top-down until we have k items.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Simple open-address hash; tuned for the small demos.
#define CAPACITY 1024
static int hkeys [CAPACITY];
static int hcnts [CAPACITY];
static int hused [CAPACITY];
static void counter_reset(void) { memset(hused, 0, sizeof(hused)); }
static void counter_inc(int x) {
unsigned h = (unsigned)(x * 2654435761u) % CAPACITY;
while (hused[h] && hkeys[h] != x) h = (h + 1) % CAPACITY;
if (!hused[h]) { hused[h] = 1; hkeys[h] = x; hcnts[h] = 0; }
hcnts[h]++;
}
// Returns a malloc'd array of k results.
int* top_k_frequent(const int* nums, int n, int k) {
counter_reset();
for (int i = 0; i < n; i++) counter_inc(nums[i]);
// Buckets indexed by frequency (1..n).
int** buckets = calloc(n + 1, sizeof(int*));
int* blens = calloc(n + 1, sizeof(int));
int* bcaps = calloc(n + 1, sizeof(int));
for (int i = 0; i < CAPACITY; i++) {
if (!hused[i]) continue;
int c = hcnts[i];
if (blens[c] == bcaps[c]) {
bcaps[c] = bcaps[c] ? bcaps[c] * 2 : 4;
buckets[c] = realloc(buckets[c], bcaps[c] * sizeof(int));
}
buckets[c][blens[c]++] = hkeys[i];
}
int* out = malloc(k * sizeof(int));
int out_n = 0;
for (int f = n; f >= 1 && out_n < k; f--)
for (int i = 0; i < blens[f] && out_n < k; i++)
out[out_n++] = buckets[f][i];
for (int i = 0; i <= n; i++) free(buckets[i]);
free(buckets); free(blens); free(bcaps);
return out;
}
int main(void) {
int a[] = {1, 1, 1, 2, 2, 3}; int* r = top_k_frequent(a, 6, 2);
printf("✓ top_k_frequent({1,1,1,2,2,3}, k=2) = {%d, %d}\n", r[0], r[1]); free(r);
int b[] = {1}; r = top_k_frequent(b, 1, 1);
printf("✓ top_k_frequent({1}, k=1) = {%d}\n", r[0]); free(r);
int c[] = {4, 1, -1, 2, -1, 2, 3}; r = top_k_frequent(c, 7, 2);
printf("✓ top_k_frequent({4,1,-1,2,-1,2,3}, k=2) = {%d, %d}\n", r[0], r[1]); free(r);
return 0;
}In Python and C++ the language gives you a heap for free; in JS we hand-roll one. The C version goes a different route entirely — bucket-by-frequency — because rolling a heap by hand in C is enough boilerplate that the O(n) bucket variant is the cleaner trade for a demo. Knowing both is the actual lesson.
Complexity
- Heap approach
- Time:
O(n log k)— n inserts/maybe-pops on a heap of size ≤ k. Space:O(n)for the count map plusO(k)for the heap. - Bucket-sort approach
- Time:
O(n)— one count pass plus one bucket walk. Space:O(n). Wins when K is close to N. - Quickselect approach
- Time:
O(n)average,O(n²)worst case. Used by C++std::nth_element. Beats both above on huge K, hurt by bad pivots.
In the wild
- Trending topic dashboards. Top-K tweeted hashtags or top-K visited URLs every minute — exactly this algorithm on a sliding window.
- Log analytics. "What were the top-K error codes in the last hour?" Same shape, often computed in a streaming engine (Flink, Spark Streaming).
- Recommender systems. Top-K nearest neighbors in a vector space — distances replace counts, otherwise identical.
- Robot perception. Top-K best object detections from a model that emits hundreds of candidates per frame.
Variations
- Top K largest values (not by frequency). Skip the counting step; heap directly.
- K closest points to the origin. Heap by Euclidean distance.
- Streaming top-K with sliding window. Use a Lossy Counting / Space-Saving algorithm — exact heap doesn't fit when the window is enormous.
- Top K with ties broken consistently. Push
(count, -key)(or wrap key in a tie-breaker) so the heap order is fully defined.