Typically, database indexes are known for quickly locating specific rows. Block Range Indexes (BRIN), however, operate differently. Rather than enabling quick lookups of individual rows, their strength lies in efficiently skipping large portions of a table that don’t contain relevant data based on summary information. Let’s take a look at how they work.

What is a BRIN index?

A BRIN index divides a table into page ranges - sets of physically adjacent pages in storage. For each range, the index maintains summary information, for instance the minimum and maximum values of the indexed column found within that range.

When a query with a condition on the indexed column executes, PostgreSQL can use a Bitmap Index Scan operation with the BRIN index. For each page range:

  1. The executor checks if the summary information (e.g., min/max values) for the range is consistent with the query conditions.
  2. If the summary indicates that no matching tuples are possible within that range, the entire range of pages is skipped – it doesn’t need to be read from disk.
  3. If the summary is consistent with the query conditions, the pages within that range might contain matching tuples. The index scan marks these pages in a bitmap.

Subsequently, a Bitmap Heap Scan operation retrieves the pages marked in the bitmap. It fetches the individual tuples from these pages and rechecks the query condition against each one, discarding tuples that don’t actually match.

BRIN indexes offer two advantages:

  1. Compactness: They store only one summary per range of pages, unlike B-Trees which index individual tuples, resulting in a much smaller footprint.
  2. Reduced I/O: They allow the database to skip reading large portions of the table during scans, potentially saving significant I/O.

However, realizing the I/O savings depends entirely on the index’s ability to effectively filter out page ranges. This requires the summary information (like min/max values) for each range to be sufficiently “narrow” to distinguish relevant ranges from irrelevant ones for a given query. Wide summary ranges offer little filtering power. Consequently, the effectiveness of a BRIN index relies heavily on a strong correlation between the indexed column’s values and their physical storage order. High correlation produces the narrow, precise summaries needed to maximize page skipping.

Let’s illustrate this with a sample table containing time-series-like data:

create table events (
event_id bigint generated always as identity,
event_time timestamp not null,
random_val bigint,
metadata text
);
-- insert 50 million rows of sample data
insert into events (event_time, random_val, metadata)
select
generate_series('2020-01-01 00:00:00'::timestamp, '2124-03-14 10:59:00'::timestamp, '1 minute') AS event_time,
floor(random() * 50000000)::bigint,
'event-' || generate_series(1, 50000000)::text AS metadata
limit 50000000;
analyze events;

The data has been created so that event_time and event_id values are perfectly correlated with the physical storage order. The random_val column, as its name suggests, has no correlation. We can verify this using the pg_stats view.

select attname, correlation from pg_stats where tablename = 'events';
-- attname | correlation
--------------+---------------
-- event_id | 1
-- event_time | 1
-- random | -0.0032407606
-- metadata | 0.323909

Now, let’s create a BRIN index and check its size:

create index idx_brin_event_id on events using brin (event_id);
select pg_size_pretty(pg_relation_size('idx_brin_event_id'));
-- 128 kB

Compare this to a B-Tree index on the same column:

create index idx_btree_event_id on events using btree (event_id);
select pg_size_pretty(pg_relation_size('idx_btree_event_id'));
-- 1071 MB

As the BRIN index only stores metadata for ranges of pages rather than for each individual row, its footprint is significantly smaller.

The impact of correlation with physical storage order

To demonstrate how critical this correlation is for the effectiveness of a BRIN index, let’s take a look at two different queries.

First, let’s query the event_id column, which perfectly correlates with physical storage order:

explain analyze select * from events where event_id between 10000000 and 15000000;
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on events (cost=1356.35..1115127.63 rows=5150384 width=38) (actual time=471.274..996.312 rows=5000001 loops=1)
Recheck Cond: ((event_id >= 10000000) AND (event_id <= 15000000))
Rows Removed by Index Recheck: 7359
Heap Blocks: lossy=41728
-> Bitmap Index Scan on idx_brin_event_id (cost=0.00..68.75 rows=5159705 width=0) (actual time=7.554..7.554 rows=417280 loops=1)
Index Cond: ((event_id >= 10000000) AND (event_id <= 15000000))

Looking at the Bitmap Heap Scan node, the crucial line is Heap Blocks: lossy=41728. This indicates that PostgreSQL only needed to read 41,728 pages from the events table. Given the table has around 416,667 pages total (check with SELECT relpages FROM pg_class WHERE relname = 'events';), the BRIN index allowed the query to skip reading approximately 90% of the table.

Now let’s examine the opposite case by creating an index on our random column. By default, PostgreSQL’s optimizer will likely choose a sequential scan for this query, recognizing that the BRIN index won’t be helpful. To see why, we can temporarily force the use of the index by disabling sequential scans. The snippet below uses pg_hint_plan, though you can also disable sequential scans using set enable_seqscan = off; (just remember to reset this afterwards.)

create index idx_brin_random on events using brin (random);
/*+Set(enable_seqscan off)*/
explain analyze select * from events where random between 10000000 and 15000000;

The BRIN index on the random column is essentially useless. Because the random values are scattered, the summary information for almost every page range indicated a potential match, causing the Bitmap Index Scan to build a bitmap flagging nearly all heap pages. The subsequent Bitmap Heap scan then had to scan all pages, read 50 million rows, and discard nearly 45 million rows.

Bitmap Heap Scan on events (cost=1392.15..1168059.15 rows=4919825 width=38) (actual time=86.566..9715.593 rows=5003107 loops=1)
Recheck Cond: ((random >= 10000000) AND (random <= 15000000))
Rows Removed by Index Recheck: 44996893
Heap Blocks: lossy=416667
-> Bitmap Index Scan on idx_brin_random (cost=0.00..162.19 rows=50000000 width=0) (actual time=37.904..37.943 rows=4166670 loops=1)
Index Cond: ((random >= 10000000) AND (random <= 15000000))

Tuning BRIN indexes

So far we’ve glossed over how many pages are grouped together in a BRIN index. As you might expect, this value can impact query performance. Larger ranges result in a smaller index footprint but produce less precise summaries, potentially causing more irrelevant pages to be scanned. Conversely, smaller ranges yield more precise summaries (improving filtering effectiveness) but increase the overall size of the BRIN index.

The default is 128 pages. This can be configured when creating the index using the pages_per_range parameter.

create index idx_brin_event_id_64 on events using brin (event_id) with (pages_per_range = 64);

Implementation details

The physical layout of a BRIN index is relatively straightforward. To take a look at how this looks on disk, let’s activate the pageinspect extension:

create extension pageinspect;

A BRIN index consists of three main components: a metadata page, a range map (revmap), and the summary pages. The metadata page stores a couple pieces of metadata, including the last revmap page and the number of pages in each range (this will match the pages_per_range value that was used when creating the index).

SELECT * FROM brin_metapage_info(get_raw_page('idx_brin_event_id', 0));
-[ RECORD 1 ]--+-----------
magic | 0xA8109CFA
version | 1
pagesperrange | 128
lastrevmappage | 3

The range map acts like an indirection layer. It maintains a mapping between page ranges and their corresponding summary entries. You can think of the revmap as an array of pointers to the actual index rows.

We can inspect the actual summary information stored for each range using brin_page_items:

select * from brin_page_items(get_raw_page('idx_brin_event_id', 6), 'idx_brin_event_id') order by blknum, attnum limit 5;
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | value
------------+--------+--------+----------+----------+-------------+------------------
197 | 0 | 1 | f | f | f | {1 .. 15361}
198 | 128 | 1 | f | f | f | {15362 .. 30721}
199 | 256 | 1 | f | f | f | {30722 .. 46081}
200 | 384 | 1 | f | f | f | {46082 .. 61441}
201 | 512 | 1 | f | f | f | {61442 .. 76801}

Each row in the summary pages contains the minimum and maximum values for a specific page range. The total number of entries will be equal to the number of pages in the table divided by the pages_per_range value.

Operator classes

The default operator class for BRIN indexes uses the minmax strategy, storing only the minimum and maximum values for each page range. This strategy has one significant drawback: susceptibility to outliers. A single, extreme value within a page range can dramatically widen the stored min/max bounds, rendering the summary ineffective for filtering out that range for many queries.

To reduce the impact of outliers, you can use a different operator class. For example, the minmax-multi strategy stores multiple min/max pairs per range, offering more resilience against outliers.

You specify the operator class during index creation:

create index idx_event_id_multi on events using brin(event_id int8_minmax_multi_ops)
📖 More on BRIN operator classes

This blog post is an excellent read to understand how outliers can impact BRIN indexes when using the default operator class and how switching operator classes can help alleviate the issue.

When should you use BRIN indexes?

As we’ve seen, BRIN indexes are especially useful in situations where queries on very large tables filter on columns that are well-correlated with the physical storage order.

The effectiveness hinges on this correlation. Time-series data is a classic example: in an append-only event tables, timestamps naturally correlate with storage layout since newer events are added at the end. The same applies to log tables with sequential IDs or any scenario where the data’s logical order matches its physical storage pattern.

You can evaluate whether a column is suitable for a BRIN index by checking its correlation with physical storage:

select attname, correlation
from pg_stats
where tablename = 'your_table';

Columns with correlation values close to 1.0/-1.0 make ideal candidates. Without this correlation, the page-range summary information becomes ineffective, forcing PostgreSQL to scan most of the table.

The concept of summarizing data pages to reduce I/O isn’t unique to PostgreSQL. As they are most beneficial for large data volumes, similar techniques are common in analytical database systems:

These implementations vary but share the core idea of using stored metadata to avoid reading unnecessary data pages.