An introduction to BRIN indexes in Postgres

Anyone who has read anything about database indexes (specifically a B-tree index) has probably seen the analogy to an index at the back of a book. It helps you jump to the exact entry you’re looking for. A Block Range Index (BRIN) works differently. Imagine you’re looking at an old stack of encyclopedias at home: instead of a detailed index, you just have the letter ranges printed on the spine. One volume covers A–B, another C–D, and so on. That doesn’t tell you exactly where a term is, but it does tell you which books you can safely ignore.

What is a BRIN index?

So how does it work? A BRIN index divides a table into page ranges (groups of physically adjacent pages in storage) and stores summary information for each one, such as the minimum and maximum values. To continue the encyclopedia analogy, each volume corresponds to a page range, and the letter range on the spine is like the summary stored for that range.

When a query filters on that column, Postgres can use the BRIN index in a Bitmap Index Scan. It checks each page range’s summary against the query condition. If the summary rules the range out, Postgres skips it entirely. Otherwise, it marks the range as a candidate match. A Bitmap Heap Scan then reads the marked pages, fetches the tuples they contain, and rechecks the condition against each one, discarding any false positives.

At this point you might reasonably ask: if a B-tree can point directly to matching rows, why use a BRIN index at all? The answer is compactness. A BRIN index stores one summary per page range rather than one entry per row, so it is dramatically smaller than a B-tree on the same column. That small footprint can still translate into real performance gains. If the summaries are selective enough, Postgres can skip large parts of the table entirely, reducing I/O during scans.

There is a catch, though. This only works well when values that are physically close together in the table also tend to be close in value. In the encyclopedia analogy, the volumes only help if the entries are arranged roughly in order. If each volume contained words from every letter of the alphabet, the labels on the spine would be useless. BRIN indexes behave the same way. If the values in a page range span too wide a range, the summary becomes too broad to filter effectively. In practice, BRIN works best when the indexed column is strongly correlated with the table’s physical storage order, because that produces tighter summaries and makes page skipping much more effective.

Let’s illustrate this with a sample table containing time-series-like data:

CREATE TABLE events (
    event_id bigint GENERATED ALWAYS AS IDENTITY,
    event_time timestamp NOT NULL,
    random_val bigint,
    metadata text
);

-- Populate 50M rows so event_id/event_time are physically ordered on disk.
INSERT INTO events (event_time, random_val, metadata)
SELECT
    '2020-01-01 00:00:00'::timestamp + (n - 1) * interval '1 minute' AS event_time,
    floor(random() * 50000000)::bigint AS random_val,
    'event-' || n::text AS metadata
FROM generate_series(1, 50000000) AS n;

ANALYZE events;

The data has been created so that event_time and event_id values are perfectly correlated with the physical storage order. The random_val column, as its name suggests, has no correlation. We can verify this using the pg_stats view.

SELECT
    attname,
    correlation
FROM pg_stats
WHERE tablename = 'events';
--  attname   |  correlation
--------------+---------------
-- event_id   |             1
-- event_time |             1
-- random_val | -0.0032407606
-- metadata   |      0.323909

Now, let’s create a BRIN index and check its size:

CREATE INDEX idx_brin_event_id ON events USING brin (event_id);

SELECT pg_size_pretty(pg_relation_size('idx_brin_event_id'));
--  128 kB

Compare that to a B-tree index on the same column:

CREATE INDEX idx_btree_event_id ON events USING btree (event_id);

SELECT pg_size_pretty(pg_relation_size('idx_btree_event_id'));
--  1071 MB

A BRIN index only stores metadata for page ranges rather than per row, which makes its footprint orders of magnitude smaller.

The impact of correlation with physical storage order

To demonstrate how much this correlation matters, let’s take a look at two queries.

First, query the event_id column, which perfectly correlates with physical storage order:

EXPLAIN ANALYZE
SELECT *
FROM events
WHERE event_id BETWEEN 10000000 AND 15000000;

 QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on events  (cost=1356.35..1115127.63 rows=5150384 width=38) (actual time=471.274..996.312 rows=5000001 loops=1)
   Recheck Cond: ((event_id >= 10000000) AND (event_id <= 15000000))
   Rows Removed by Index Recheck: 7359
   Heap Blocks: lossy=41728
   ->  Bitmap Index Scan on idx_brin_event_id  (cost=0.00..68.75 rows=5159705 width=0) (actual time=7.554..7.554 rows=417280 loops=1)
         Index Cond: ((event_id >= 10000000) AND (event_id <= 15000000))

Looking at the Bitmap Heap Scan node, the key line is Heap Blocks: lossy=41728. This indicates that Postgres only needed to read 41,728 pages from the events table. Given the table has around 416,667 pages total (check with SELECT relpages FROM pg_class WHERE relname = 'events';), the BRIN index allowed the query to skip reading approximately 90% of the table.

Now let’s examine the opposite case by creating an index on our random column. By default, Postgres’s optimizer will likely choose a sequential scan for this query, recognizing that the BRIN index won’t be helpful. To see why, we can temporarily force the use of the index by disabling sequential scans. The snippet below uses pg_hint_plan, though you can also disable sequential scans using set enable_seqscan = off; (remember to reset this afterwards).

CREATE INDEX idx_brin_random ON events USING brin (random_val);

-- Force the BRIN index to show how poor correlation degrades its usefulness.
/*+Set(enable_seqscan off)*/
EXPLAIN ANALYZE
SELECT *
FROM events
WHERE random_val BETWEEN 10000000 AND 15000000;

 Bitmap Heap Scan on events  (cost=1392.15..1168059.15 rows=4919825 width=38) (actual time=86.566..9715.593 rows=5003107 loops=1)
   Recheck Cond: ((random_val >= 10000000) AND (random_val <= 15000000))
   Rows Removed by Index Recheck: 44996893
   Heap Blocks: lossy=416667
   ->  Bitmap Index Scan on idx_brin_random  (cost=0.00..162.19 rows=50000000 width=0) (actual time=37.904..37.943 rows=4166670 loops=1)
         Index Cond: ((random_val >= 10000000) AND (random_val <= 15000000))

The BRIN index on random_val is essentially useless. Because the values are scattered, the summary information for almost every page range indicated a potential match. The Bitmap Index Scan therefore builds a bitmap covering nearly all heap pages, and the Bitmap Heap Scan ends up reading the entire table, scanning 50 million rows and discarding nearly 45 million of them.

Tuning BRIN indexes

So far we’ve glossed over how many pages are grouped together in a BRIN index. The default is 128, controlled by the pages_per_range storage parameter. Larger ranges keep the index small but produce coarser summaries, so Postgres ends up reading more candidate pages. Smaller ranges give you tighter filtering at the cost of a larger index.

CREATE INDEX idx_brin_event_id_64 ON events USING brin (event_id) WITH (pages_per_range = 64);

Implementation details

The physical layout of a BRIN index is relatively straightforward. To see how this looks on disk, let’s activate the pageinspect extension:

CREATE EXTENSION pageinspect;

A BRIN index consists of three main components: a metadata page, a range map (revmap), and the summary pages. The metadata page stores a few key values, including the last revmap page and the number of pages in each range.

SELECT *
FROM brin_metapage_info(get_raw_page('idx_brin_event_id', 0));

-[ RECORD 1 ]--+-----------
magic          | 0xA8109CFA
version        | 1
pagesperrange  | 128
lastrevmappage | 3

The range map acts as an indirection layer. It maintains a mapping between page ranges and their corresponding summary entries. You can think of the revmap as an array of pointers to the actual index rows.

We can inspect the actual summary information stored for each range using brin_page_items:

SELECT *
FROM brin_page_items(get_raw_page('idx_brin_event_id', 6), 'idx_brin_event_id')
ORDER BY blknum, attnum
LIMIT 5;

 itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |      value
------------+--------+--------+----------+----------+-------------+------------------
        197 |      0 |      1 | f        | f        | f           | {1 .. 15361}
        198 |    128 |      1 | f        | f        | f           | {15362 .. 30721}
        199 |    256 |      1 | f        | f        | f           | {30722 .. 46081}
        200 |    384 |      1 | f        | f        | f           | {46082 .. 61441}
        201 |    512 |      1 | f        | f        | f           | {61442 .. 76801}

Each row in the summary pages contains the minimum and maximum values for a specific page range. The total number of entries equals the number of pages in the table divided by pages_per_range, rounded up.

Operator classes

The default operator class for BRIN indexes uses the minmax strategy, storing only the minimum and maximum values for each page range. This strategy has one drawback: susceptibility to outliers. A single extreme value within a page range can widen the stored min/max bounds enough to make the summary ineffective for filtering.

To reduce the impact of outliers, you can use a different operator class. For example, the minmax-multi strategy stores multiple min/max pairs per range, which limits how much a single outlier can widen the effective range.

You specify the operator class during index creation:

CREATE INDEX idx_event_id_multi ON events USING brin (event_id int8_minmax_multi_ops);

📖 More on BRIN operator classes

This blog post explains how outliers degrade BRIN indexes with the default operator class and how minmax-multi mitigates the problem.

When should you use BRIN indexes?

BRIN indexes are a good fit for large tables where the queried column correlates with physical storage order. Time-series data is the canonical case: in an append-only event table, timestamps line up with storage layout because newer rows are appended at the end. The same holds for log tables with sequential IDs, or any column whose logical order matches its physical layout. The pg_stats.correlation values shown earlier are the best signal — columns close to ±1.0 are ideal candidates.

The same idea in other databases

The idea of summarizing data pages to skip I/O isn’t unique to Postgres. Analytical databases lean on similar techniques:

DuckDB exposes them as min-max indexes
Google BigQuery uses block pruning on clustered tables
ClickHouse offers skipping indexes for the same purpose.