Home

The Best Multi-Column Index Formula

This article supplements Understanding How Databases Use Indexes. I recommend reading that first.

x equality condition fields + 1 range condition field OR y sort fields + z other fields to retrieve. (x >= 0, y >= 0, z >= 0)

Example:

SELECT a, b, c, d, e FROM table WHERE a = 1 AND b = 2 ORDER BY c DESC, d ASC;

The best index is:

(   a,  b,     c DESC, d ASC,         e        )
  equality   |   sort fields    | other fields
  fields     |                  | to retrieve

Equality Condition Fields

Equality condition fields are those using = in the WHERE clause. For example, class in WHERE class = 2 is an equality condition field.

If there are multiple equality condition fields, their order is discussed in detail in Understanding How Databases Use Indexes. You should design indexes that work effectively across as many queries as possible, not just the current one.

Range Condition Fields

Range condition fields are those using >, <, >=, <=, etc. in the WHERE clause. For example, score in WHERE score > 90 is a range condition field.

Range conditions significantly impact query efficiency, so minimize their use. In the best multi-column index formula, there’s at most one range condition field, and it cannot coexist with sort fields. If sorting is needed, prioritize sorting and find ways to avoid range condition filtering. Here’s an example showing why range condition fields can’t coexist with sort fields, and how to avoid range condition filtering.

Suppose we have a films table with movie information including name, release_date, rating, country. We want to query Chinese films with ratings above 8.0, sorted by release date:

SELECT * FROM films WHERE rating > 8.0 AND country = 'China' ORDER BY release_date DESC;

With index (country, rating, release_date):

index on (country, rating, release_date)

                                                          |
                                                          v ------------->
+--------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| country      || America | America | America | China | China | China | China | Japan | Japan | Japan |
+--------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| rating       ||   6.0   |   8.5   |   9.0   |  7.0  |  8.5  |  9.0  |  9.5  |  8.0  |  8.5  |  9.0  |
+--------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| release_date ||  2024   |  2021   |  2023   | 2023  | 2015  | 2022  | 2016  | 2023  | 2013  | 2015  |
+--------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
                                                         ✅       ✅      ✅

This index looks perfect, but results found through it aren’t sorted by release_date—the release_date in the index is ineffective. The database still needs to sort all matching rows by release_date, which is very time-consuming for large datasets. If we only need the first few rows, this sorting is wasteful. This is why range condition fields can’t coexist with sort fields, and why sorting should be prioritized.

The optimization approach is converting range conditions to equality conditions. This requires optimization based on actual needs. In our example, we want films rated above 8.0—we can define these as “high-rated films.” Add a field is_high_rating: 1 when rating > 8.0, 0 otherwise. Then change the index to (country, is_high_rating, release_date).

SELECT * FROM films WHERE is_high_rating = 1 AND country = 'China' ORDER BY release_date DESC;
index on (country, is_high_rating, release_date)

                                                                            |
                                                            <-------------- v
+----------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| country        || America | America | America | China | China | China | China | Japan | Japan | Japan |
+----------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| is_high_rating ||    0    |    1    |    1    |   0   |   1   |   1   |   1   |   1   |   1   |   1   |
+----------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
| release_date   ||  2024   |  2021   |  2023   | 2023  | 2015  | 2016  | 2022  | 2013  | 2015  | 2023  |
+----------------++---------+---------+---------+-------+-------+-------+-------+-------+-------+-------+
                                                           ✅       ✅      ✅

If your database supports it, you can also use a functional index (country, IF(rating > 8, 1, 0), release_date), or use a virtual/computed column.

Note: !=, <>, IS NOT NULL and other inequality operators are also range conditions. If needed, they can also be optimized by converting to equality conditions.

Sort Fields

Sort fields are those in the ORDER BY clause. For example, release_date in ORDER BY release_date DESC is a sort field.

Sort field order should match ORDER BY order, and ascending/descending should be the same or completely opposite. For example, ORDER BY release_date DESC, rating ASC corresponds to index (release_date DESC, rating ASC) or (release_date ASC, rating DESC). The opposite works because databases can traverse indexes in reverse. For example, to get the first five results sorted by release_date descending and rating ascending:

SELECT * FROM films ORDER BY release_date DESC, rating ASC LIMIT 5;

With index (release_date DESC, rating ASC):

index on (release_date DESC, rating ASC)

                     |
                     v -------------------------------->
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| release_date ||  2024   | 2023  | 2023  |  2023   | 2022  |  2021   | 2016  | 2015  | 2015  | 2013  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| rating       ||   6.0   |  7.0  |  8.0  |   9.0   |  9.0  |   8.5   |  9.5  |  8.5  |  9.0  |  8.5  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
                     ✅       ✅      ✅       ✅        ✅

With index (release_date ASC, rating DESC):

index on (release_date ASC, rating DESC)

                                                                                                  |
                                                                 <------------------------------- v
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| release_date ||  2013   | 2015  | 2015  |  2016   | 2021  |  2022   | 2023  | 2023  | 2023  | 2024  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| rating       ||   8.5   |  9.0  |  8.5  |   9.5   |  8.5  |   9.0   |  9.0  |  8.0  |  7.0  |  6.0  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
                                                                ✅        ✅      ✅      ✅       ✅

But you can’t use (release_date DESC, rating DESC) because it doesn’t match the required sort order.

index on (release_date DESC, rating DESC)

                     |
                     v -------------------------------->
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| release_date ||  2024   | 2023  | 2023  |  2023   | 2022  |  2021   | 2016  | 2015  | 2015  | 2013  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
| rating       ||   6.0   |  9.0  |  8.0  |   7.0   |  9.0  |   8.5   |  9.5  |  9.0  |  8.5  |  8.5  |
+--------------++---------+-------+-------+---------+-------+---------+-------+-------+-------+-------+
                     ✅       ✅      ✅       ✅        ✅

As you can see, results from the (release_date DESC, rating DESC) index don’t satisfy the ORDER BY release_date DESC, rating ASC requirement, requiring additional sorting.

Other Fields to Retrieve

Other fields to retrieve are those needed in SELECT that aren’t already in the index. If the index contains all fields to retrieve, the database can get data directly from the index without querying the table. This reduces I/O operations and improves query efficiency. However, if the index contains too many fields, it becomes too large, affecting insert, update, and delete performance, and using unnecessary memory. So putting all fields in the index isn’t always best—you need to make trade-offs based on actual circumstances.

Comments

Loading...