Why Your Index Isn't Being Used
Simply put, there are two reasons why the database doesn’t use your index:
- Your index isn’t suitable
- The query optimizer thinks not using the index is faster
To create suitable indexes, see my previous article Understanding How Databases Use Indexes.
Let’s explore why the query optimizer might think not using an index is faster, even when you’ve created a suitable index.
How Databases Read Data
Before diving in, let’s understand how databases read data. There are two ways:
- Using an index: Get the file positions of matching rows through the index, then read them from the file sequentially. Since these rows are at different positions in the file, each read requires jumping to a different file position. This is called random read.
- Not using an index: Sequentially read all rows from the table file, filtering out matching ones. This is called sequential read.
Sequential reads are much faster than random reads. Especially on mechanical hard drives, where moving the disk head is very slow. Although modern SSDs have much faster random read speeds, sequential reads are still faster.
Why Not Using an Index Is Faster
Reading Many Rows
Suppose we have a table with 100 rows. Random read takes 4 time units per row, while sequential read takes 1 time unit per row. Here’s the time needed to read n rows with and without an index:
- Using index:
4 * n - Not using index:
1 * 100
It’s easy to see that when n < 25, using the index is faster; when n > 25, not using the index is faster. That is, when the query returns a large proportion of the table, not using an index is faster. This proportion depends on the relative speeds of random and sequential reads—it’s not a fixed value, but we typically consider it to be between 10% and 30%.
💡 If you’re using a faster SSD, you can configure a lower random read cost for the database, making the query optimizer more inclined to use indexes. In PostgreSQL, you can set
random_page_costto 1.1.
Outdated Statistics
How does the query optimizer know what proportion of rows match the condition? The database maintains statistics about tables, including row counts, value distribution of indexed columns, etc. The optimizer estimates the cost of using vs. not using an index based on this information. If statistics are outdated, the optimizer may make wrong decisions.
For example, if matching rows actually account for 5%, but statistics show 50%, the optimizer will incorrectly think not using an index is faster.
💡 It’s recommended to run
ANALYZE TABLEto recalculate statistics after making large changes, or schedule regular recalculations.
Comments