Using PostgreSQL Explain

Have you ever wanted to know how a database thinks or what steps it will take to find your data? Well there is a command for this, EXPLAIN. 

When a database receives a query, it decides how best to retrieve the data. The database creates a query plan based on different factors including the structure of the query and the properties of your data. As developers, we can use the EXPLAIN command to see what query plan the database is going to use for a query.

The EXPLAIN command is very simple:

  -- EXPLAIN statement
  EXPLAIN select * FROM users;

The output of this command will be the query plan, more specifically the output of this command will be a tree of plan nodes.

                        QUERY PLAN                        
 Seq Scan on users  (cost=0.00..10.40 rows=40 width=1672)

In the case of this basic query, the output is a single plan node.

Because there is no WHERE clause, the database has chosen todo a sequential scan (Seq Scan) on the entire table. As you might expect there are different scan nodes to describe different table access methods, for example, index scans and bitmap index scans.

Lets look at the numbers inside the parenthesis. The cost has two numbers, the first number (0.00) is the estimated startup cost. It basically tries to estimate how long it will take before the output phase of the query begins. For example, the time it takes to do the sorting in a sort node. The second number (10.40), is the estimated total cost.

Rows and width are somewhat self explanatory. Rows, is the estimated number of rows the output will produce. Finally, the width, which is the estimated average width of rows in bytes.

It is important to realize that all these numbers are estimates, including the cost numbers. The cost numbers are actually arbitrary units determined by the planner.

Lets take a look at a slightly more complicated output.

-- EXPLAIN SELECT * FROM users WHERE unique < 50;

                        QUERY PLAN                        
 Bitmap Heap Scan on users (cost=5.07..229.20, rows=101, width=0)
   Recheck Cond: (unique1 < 50)
   -> Bitmap Index Scan on users_unique1 (cost=0.00..5.04 rows=101 width=0)
     Index Cond: (unique1 < 50)

The database planner has decided to break up this query in two steps. First the database will visit the indexes to find the location of the rows matching the WHERE condition (Bitmap Index Scan), then it will fetch the actual rows (The Bitmap Heap Scan). The important thing to remember when you get a tree with multiple nodes, is that the top node will include the cost of all its child nodes. 

All these costs estimates are great, but wouldn't you like to see the actual time it takes for a query to run? Well we can do this with the ANALYZE option.

WARNING: Using (ANALYZE) runs the actual query.

                       QUERY PLAN                                               
 Seq Scan on agents  (cost=0.00..12.23 rows=123 width=2072) (actual time=0.099..0.399 rows=123 loops=1)
 Planning time: 3.135 ms
 Execution time: 0.487 ms

Notice the new information we get, actual time, Planning time and Execution time. It is important to know that actual time is measured in milliseconds while the cost is still estimates. The same goes for rows, one is an estimate and the other is actual rows. Planning time is the time it took for the database to generate the query plan.

This is just the very basic information on the EXPLAIN command and the best way to learn it is by playing, so open up your database console and query away. Notice all the different scan nodes you get, change your tables and see if you get a different plan. Remove an index, what happens?

Get curious!