Knowledge base

data-structures

trees

databases

fundamentals

· 12 min read

How trees work — a novice's field guide

From binary search trees to B-trees and LSM trees. Why sorted data structures exist and where each one shines.

TL;DR

A tree is a way to organize data so you can find things fast even when there's a LOT of it. The trick is that the data stays in order — so you can halve the search space at each step. Hash tables scatter data randomly (instant lookup of one specific thing, useless for 'give me everything between X and Y'). Trees keep data sorted (slightly slower lookup, but 'top 100' and 'date range' queries become natural and fast).

Start with a phone book

Imagine you have a printed phone book with a million names, sorted alphabetically. You want to find 'Priya Shah'.

You don't scan from the first page. You open roughly in the middle — say you land on 'Mendoza'. Priya comes after. You jump to the middle of the second half — land on 'Ramirez'. Still past it. Keep halving.

After about 20 jumps, you've found her. Out of a million names.

If the phone book were shuffled randomly, you'd have to read every single page to find Priya. That's 'O(n)' — tens of thousands of times slower at this size. Sorted order is a superpower.

A tree is a data structure that gives you this for free

A binary search tree (BST) is the simplest tree. Each node holds a value and has up to two children.

        [10]
        /  \
     [5]    [15]
    /  \    /  \
  [2] [8] [12] [20]

A binary search tree with 7 nodes

The rule: everything in the LEFT subtree is smaller than the node, everything in the RIGHT subtree is larger. That rule is enforced every time you insert — and it's what makes searching fast.

To find 12: start at 10. 12 > 10, so go right. You're now at 15. 12 < 15, so go left. You've found 12. Three steps instead of checking all seven nodes.

In code

function insert(node, value) {
  if (!node) return { value, left: null, right: null };
  if (value < node.value) {
    node.left = insert(node.left, value);
  } else if (value > node.value) {
    node.right = insert(node.right, value);
  }
  return node;
}

function find(node, value) {
  if (!node) return null;
  if (value === node.value) return node;
  if (value < node.value) return find(node.left, value);
  return find(node.right, value);
}

Basic BST insertion and search. The recursion mirrors the 'halving the phone book' idea.

The magic: range queries

Here's where trees beat hash tables badly. Say you want 'everything between 8 and 20'.

On a hash table, you'd have to check EVERY SINGLE KEY. There's no concept of 'nearby' — keys are scattered randomly by design.

On a tree, you just walk the tree in sorted order starting at 8 and stop as soon as you pass 20. Because sorted order is baked into the structure.

function range(node, low, high, results = []) {
  if (!node) return results;
  // Only recurse left if something there could be ≥ low
  if (low < node.value) range(node.left, low, high, results);
  // Collect if we're inside the range
  if (low <= node.value && node.value <= high) results.push(node.value);
  // Only recurse right if something there could be ≤ high
  if (high > node.value) range(node.right, low, high, results);
  return results;
}

Range query on a BST — only visits branches that could contain answers.

Range query on a binary search tree

Pick a low and high bound and watch the traversal. Nodes in range turn green; whole branches that can't contain any answer are pruned in gray. This pruning is the whole reason range queries are fast on trees.

Interactive

lowhigh

visitingin range (collected)pruned / skippeduntouchedcollected: 0 · pruned: 0

Traversal log

Pick a range and click “Run query”.

What else could I use for range queries?

Trees aren't the ONLY way to answer 'give me everything between X and Y'. Here's the full menu, roughly ordered from 'naive' to 'production-grade':

Linear scan of unsorted data — O(n). Check every item. Fine for small lists, catastrophic at scale.
Hash table + filter — also O(n). Hashing buys you nothing here; hash tables don't preserve order, so you still have to inspect every bucket. Common novice trap: 'but my hash table is fast!' — it is, for point lookups. Not for ranges.
Sort on demand — take your unsorted data, sort it, then binary-search into the sorted array. Great for one-off batch jobs. Terrible if data changes often, because you re-sort on every insert.
Pre-sorted array — keep data sorted at all times. Reads are O(log n + k). But inserts become O(n) because you have to shift elements. Fine for read-mostly workloads.
Tree (BST, AVL, red-black, skip list) — O(log n) insert AND O(log n + k) range reads. This is the sweet spot. Both operations stay cheap as the data grows.
B-tree / B+ tree — a tree optimized for disk. Same algorithmic properties as a balanced BST but each node is a disk page with hundreds of keys, so random I/O is minimized. This is what relational databases use.
LSM tree — sorted, but writes are optimized. Reads cost a little more (multiple levels to check). Good when writes dominate.
Specialized ordered indexes — Redis sorted sets (skip list internally), Elasticsearch (Lucene's BKD tree for numeric ranges + inverted index for text), ClickHouse's sparse indexes on sorted data. All tree-flavored under the hood.

Why 'tree' specifically wins

The reason trees come up over and over is that they hit a rare sweet spot: they keep BOTH operations cheap. Inserts stay O(log n). Range reads stay O(log n + k). Nothing else gives you both.

Linear scans are cheap to insert into but slow to query.
Pre-sorted arrays are cheap to query but slow to insert into.
Hash tables are cheap for everything EXCEPT range queries.
Trees maintain order without paying the array-insert cost. That's the magic.

Balance matters (a lot)

A binary tree works great IF it stays balanced — roughly the same depth on both sides. If it gets lopsided, your 'log n' performance collapses.

Watch what happens if you insert [1, 2, 3, 4, 5] in order into a plain BST:

A degenerate BST — every insert went right. Finding 5 now takes 5 steps, same as a linked list.

This is no better than scanning a shuffled list. The solution is self-balancing trees — data structures that rearrange themselves as you insert to keep both sides roughly equal.

AVL trees — strict balance, slightly slower inserts, faster reads.
Red-black trees — looser balance, what Java's TreeMap and C++'s std::map use.
Skip lists — not technically a tree, but achieves similar log-n behavior using probability. Redis uses these for sorted sets.

But real databases don't use binary trees

Binary trees are perfect for teaching but terrible for databases. Here's why.

When a database has a billion rows, a balanced binary tree is ~30 levels deep. Looking up a row means potentially 30 disk reads. Disks (even SSDs) are WAY slower at random reads than sequential reads — each level of the tree could be a different disk page. That's ~30 random I/Os for ONE lookup.

The fix: make each tree node BIGGER. Instead of 2 children per node, have hundreds.

Enter the B-tree

A B-tree (the 'B' is just a letter — don't read into it) is a tree where each node holds MANY keys and has MANY children. A typical B-tree node in a database might hold 100-500 keys.

             [10 | 20 | 30]
          /      |    |    \
  [2,5,8]  [12,15]  [22,25]  [32,35,40]

A (tiny) B-tree node holds multiple keys. Real B-tree nodes typically match a disk page — 4 KB to 16 KB.

Finding 25 in this tree: look at root → 25 is between 20 and 30 → follow that pointer → now at [22,25] → found. Two hops instead of many.

With a branching factor of 500, finding ONE record in a BILLION items takes only about 4 hops (log base 500 of a billion ≈ 3.4). Four disk reads instead of thirty.

Why the branching factor matters so much

Fewer levels = fewer disk reads per lookup.
Each node matches a disk page, so one I/O brings in hundreds of keys.
Range scans walk sideways through leaf nodes (in B+ trees, the leaves are linked) — very cache-friendly.
Trade-off: inserts occasionally need to 'split' a full node, which means writing two pages. Real engines buffer this.

When writes are crazier than reads: LSM trees

B-trees handle balanced read/write workloads beautifully. But what if you have insane write traffic — a logging system, a metrics database, a write-heavy IoT pipeline taking a million writes per second?

Every B-tree write might need to read a page off disk, modify it, and write it back. If the page is full, split it. That's a lot of I/O per write.

LSM (Log-Structured Merge) trees take a completely different approach:

All writes go into a small in-memory buffer (called a 'memtable' — often a skip list internally).
When the memtable fills up, it's sorted and flushed to disk as a single immutable file, called an SSTable ('Sorted String Table').
New SSTables keep stacking up. A background process periodically merges smaller older SSTables into larger ones — this is called compaction.
Reads check the memtable first, then each level of SSTables in turn. Bloom filters tell the engine to skip files that definitely don't contain the key.

The upside: writes are almost always just a memory write plus an occasional sequential disk flush. Sequential I/O is WAY faster than random I/O. Writes go brrr.

The downside: reads might have to check multiple files (memtable + several SSTable levels). Compaction uses CPU and disk in the background. Read amplification is the main cost you pay.

Cheat sheet: when to use what

Point lookup only ('give me user 42') — hash table wins. O(1) is unbeatable.
Range queries ('users 100-200') — tree. Sortedness is the whole game.
Top-N queries ('top 100 by score') — tree / sorted index.
Sorted iteration ('iterate alphabetically') — tree. Hash tables lose ALL ordering info.
Write-heavy, ok with read amplification — LSM tree.
Balanced read/write, ok with write amplification — B-tree.
In-memory, smallest possible lookup — hash table beats both.

How this connects to the 'Hash vs Tree' question

Now the architectural trade-offs in the system-design question should feel concrete:

'Hash partitioning is fast for point reads' — because hash → partition is arithmetic, no tree traversal or sorted lookup involved.
'Hash partitioning destroys locality' — because the whole point of hashing is that adjacent keys go to different places. You can't do a range scan without talking to every shard.
'Range-partitioned tree needs coordination' — because to find which node owns key K, you have to walk (or look up) a metadata tree. And if the design is strongly consistent, the owner node serializes reads through its leader.
'Redis sorted sets', 'Elasticsearch', 'LSM tree' in the hybrid option — these are all ordered structures that specialize in range/top-N queries. The hybrid design pairs a hash-based primary (fast point reads) with one of these ordered secondaries (fast range queries).