Secondary indexes — and why you often need a separate system

Why one database can't do every kind of query, and how two systems stay in sync through change streams.

TL;DR

Your primary database stores data one way (usually by primary key). When you need to query it a DIFFERENT way — by a range of values, full-text search, geographic proximity, or a non-key field — you need a secondary index. Small cases: an index inside the same database. Large cases: a completely separate specialized system, kept in sync via change streams. The trade-off is operational complexity and eventual consistency, in exchange for query flexibility and scale.

What an index actually is

Think of a textbook. The table of contents is organized by chapter order. If you want 'Chapter 7', it's fast. But if you want 'every page that mentions mitosis', the TOC is useless — you'd read the whole book.

That's why textbooks have an INDEX at the back — a separate, differently-sorted view of the same content. The chapters stay in their original order; the index organizes by topic. It's a second lookup structure on top of the main text.

Database indexes are exactly this. The primary storage organizes data one way (usually sorted by primary key). An index is an additional structure that organizes the SAME data by a different field, so queries on that other field don't have to scan everything.

Indexes inside the database

Postgres, MySQL, SQL Server — all of them let you add indexes with a single SQL command:

CREATE INDEX idx_users_email ON users (email);

One line. The database builds and maintains the index for you.

Now SELECT * FROM users WHERE email = 'priya@example.com' is a log-n tree lookup instead of a full scan. The index is a B+ tree (see the Trees article) keyed on email, pointing back at the primary row.

Trade-offs that come for free with every index:

Every write to the table now has to ALSO update every index. More indexes = slower writes.
Indexes cost disk space — sometimes 10-30% of table size each.
The query planner might still ignore your index if it thinks a scan is cheaper (e.g., on tiny tables or when you're fetching most of the rows anyway).

When one database can't do it all

In-database indexes work beautifully until one of these walls shows up:

Scale mismatch: you do a billion primary-key lookups/day but only thousands of full-text searches. Paying for index maintenance on the primary hurts.
Data model mismatch: your primary store is a hash-partitioned KV (like DynamoDB or Redis). You CAN'T build a global range index on it — the data model doesn't support it.
Query shape mismatch: your primary is a relational DB, but you need full-text search with relevance ranking, fuzzy matching, and synonyms. Postgres has some of this; Elasticsearch is a specialty.
Failure-domain mismatch: you want leaderboard queries to still work even if the primary is having a bad day (and vice versa).

When any of these walls shows up, you build the secondary index in a SEPARATE system that's purpose-built for that query shape.

How they stay in sync: Change Data Capture (CDC)

The hard part of running two systems is keeping them consistent. The wrong way is to have the application write to both:

// DON'T DO THIS — dual writes
await primaryDB.write(userId, data);
await elasticsearch.index(userId, data);  // what if this fails?

Dual-write anti-pattern. If the second write fails, the two systems silently disagree forever.

The right way is Change Data Capture. The primary system is the single source of truth. Every change it makes emits an event onto a log, and a separate process consumes that log to update the secondary:

  App → Primary DB → CDC log (Kafka/Kinesis/DynamoDB Streams)
                              ↓
                         Secondary indexer
                              ↓
                         Secondary index (Elasticsearch, sorted set, ...)

One source of truth. The secondary is always downstream.

If the secondary indexer crashes, it resumes from where it left off using the log offset.
If you want to rebuild the secondary from scratch, you replay the log from the beginning.
The primary never waits for the secondary — it just writes its local storage and emits to the log.
The secondary is eventually consistent with the primary. Lag is usually milliseconds to low seconds.

Real-world examples

DynamoDB (primary KV) + Elasticsearch (secondary text/range search) via DynamoDB Streams — Amazon retail runs variants of this pattern.
Postgres (primary) + Algolia (search UX index) via logical replication or triggers — very common in SaaS.
Cassandra (primary, LSM-tree KV) + Solr / Elasticsearch (search) — what many analytics and messaging platforms use.
Redis cluster (primary sessions) + Redis sorted sets in a separate deployment (leaderboards) — the exact shape behind the 'Hash vs Tree' question.
Transactional DB (Postgres/MySQL) + OLAP warehouse (Snowflake, BigQuery, ClickHouse) via Debezium or Fivetran — for analytical queries that would destroy the transactional DB.

What 'ordered secondary index' means specifically

When the Hash vs Tree question said 'distributed ordered index — an external ranked store or search service', it was describing this exact pattern, with the secondary specifically being an ORDERED structure (a tree-like system). Examples:

Redis sorted sets — use a skip list internally; O(log n) insert + range scan.
Elasticsearch / OpenSearch — Lucene under the hood, with inverted indexes for text and BKD trees for numeric ranges.
A separate deployment of RocksDB (LSM tree) tuned for range-heavy reads.
Specialized 'ranking service' built on top of any of the above.

These are systems whose core feature is 'store sorted data, answer top-N and range queries fast.' They're the other half of the hybrid architecture — paired with a hash-partitioned primary, each system does exactly one job well.

The operational trade-off (honest version)

You now run two storage systems. Two oncalls. Two capacity plans. Two failure modes.
You need CDC infrastructure (Kafka / Kinesis / Debezium / cloud-native streams) — more moving parts.
Consistency story is more nuanced. 'I wrote, why don't I see it in the leaderboard yet?' is now a legitimate question.
But: failure isolation is an underrated win. If the secondary is degraded, point reads still work. If the primary has a bad day, cached leaderboards often keep loading.

Connecting back

In the Hash vs Tree question, Option C is this pattern made concrete: a primary hash-partitioned store for session lookups + an async-maintained ordered secondary index for leaderboards, connected by CDC. The per-option analysis in that question should now read differently: you're not picking 'one data structure', you're picking an architecture that uses two specialists.

The reason the hybrid wins is that the workload is asymmetric: 'millions of point reads + frequent range queries' is two different jobs. Asymmetric workloads want asymmetric systems.