A Deep Dive Into Unexpectedly Slow SQLite Queries

I’ve recently run into a couple behaviors which I found odd with respect to how SQLite decided to execute a particular query (and those with some similar characteristics). These oddities appear in both in how the query planner works and in the generated opcodes. I decided to take a look at what was going on, and soon found myself reading the source, tracing syscalls and library calls, and debugging the opcodes from the VM that SQLite implements. I learned an incredible amount, and had a lot of fun. Some of my findings are below.

The Query

This is the pared-down query that my discoveries center around, but my questions aren’t necessarily specific to this query (there ends up being a whole class of queries which result in the same “optimization”):

    SELECT a.v_id
    FROM a, l, v, m
    WHERE l.l_cid = 300
      AND l.l_flag_id = 5
      AND v.id = l.v_id
      AND a.v_id = v.id
    LIMIT 10;

With ANALYZE (and subsequent SELECT * FROM sqlite_stat1;, we see some stats for our tables:

m is 100 rows
l is 65,000,000 rows
v is 220,000 rows
a is 850,000 rows

The Query Plan

I ran the query with SQLite versions 3.8.5 and SQLite 3.7.13 and noticed a massive time difference between the two (where on 3.7.13 the time was on the order of seconds, and on 3.8.5 I let it run for 50 minutes before giving up hope and deciding to give my HD a break). Naturally, I started with EXPLAIN QUERY PLAN and got the below. Note that I have no indices at this point and have not yet run ANALYZE (i.e. sqlite_stat1 is empty) and that roughly explains the difference in query plans between the two versions.

From v3.7.13:

order from  detail
----- ----- --------------------------------------------------------------------
0     1     SCAN TABLE l (~10000 rows)
1     2     SEARCH TABLE v USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
2     0     SEARCH TABLE a USING AUTOMATIC COVERING INDEX (v_id=?) (~7 rows)
3     3     SCAN TABLE m (~1000000 rows)

From v3.8.5:

order  from  detail
----- ------ -------------------------------------------------------------------
0      1     SEARCH TABLE l
                    USING AUTOMATIC COVERING INDEX (l_cid=? AND l_flag_id=?)
2      3     SCAN TABLE m

We no longer see the estimates of table lengths in the newer version, and the query planner (QP) has determined that it makes sense to generate an automatic covering index. After running ANALYZE and generating the stats table, both versions agree on the query plan; 3.8.5 remains essentially the same, and 3.7.13 mirrors it. This also means that 3.7.13 no longer completes the query (in the ~50 minutes I gave it to run); the opcodes generated by both versions are very similar, and what’s going on under the hood doesn’t change (this at least is good; it means things are consistent!)

I’m not clear on why an automatic covering index is determined to be the most efficient thing to do here; the creation of such an index would take longer than the query itself (in fact, testing this by explicitly creating a covering index [with e.g. CREATE INDEX ON …] takes roughly 5 minutes). The query as run on 3.7.15 takes about 5 seconds with no indices. The big O back-of-the-napkin calculation confirms this; creating the index (O(NlogN)) and then using it (some ~O(logN)) ends up being asymptotically way more expensive than simply scanning the table (O(N)) (and, practically, WAY more expensive).

I set this aside for a bit while I looked into a more serious problem. Why does generating the automatic covering index take over 50 minutes when the CREATE INDEX query takes 5 minutes to run?

Deeper Still: The SQLite Virtual Machine

A reasonable place to start is to run EXPLAIN on the query. The generated opcodes are rather long, so I’ve included the full readout elsewhere. The relevant parts are below.

7   OpenAutoindex  4    10    0   k(10,nil,n..  00
8   Rewind         1    22    0                 00
9   Column         1    4     3                 00
10  Column         1    5     4                 00
11  Rowid          1    12    0                 00
12  MakeRecord     3    4     2                 00
13  IdxInsert      4    2     0                 10
14  Next           1    9     0                 03

The below is an extract of the opcodes generated when creating the index by hand with CREATE INDEX on the same two columns:

12  SorterOpen    3   0       0  k(3,nil,ni..   00
13  OpenRead      1   214248  0  9              00
14  Rewind        1   21      0                 00
15  Column        1   4       10                00
16  Column        1   5       11                00
17  Rowid         1   12      0                 00
18  MakeRecord    10  3       9                 00
19  SorterInsert  3   9       0                 00
20  Next          1   15      0                 00
21  OpenWrite     2   1       0  k(3,nil,ni..   03
22  SorterSort    3   26      0                 00
23  SorterData    3   9       0                 00
24  IdxInsert     2   9       1                 10
25  SorterNext    3   23      0                 00

The big difference here is that it’s using an AutoIndex (OpenAutoIndex), whereas the CREATE INDEX version uses a Sorter (SorterOpen). At first blush, this makes sense; AutoIndex is for an automatic index, after all.

Getting to the Bottom of It

But when you look at the code and at the traces of sys and lib calls spit out by DTrace (which is such a cool piece of software, by the way), things get real bad, real fast. Here are some samples from the automatic index creation, and here are some from the explicit index creation.

It turns out that AutoIndex is just directly creating a table on disk (a table which, for all intents and purposes, is destroyed at the end of the query). And then it’s inserting into the table the rows of the index. One by one. And since a table is a B tree, it needs to be balanced after every insert. And that rebalancing happens on disk. Because it’s inserting unsorted data, it must rebalance and write to disk a lot.

The trace for this shows many seeks, reads, and writes after nearly every B tree insert and B tree sort. That’s a ton of bad mojo happening on my SSD. And it’s absurdly slow.

Conversely, opening a Sorter basically adds these rows to a in-memory (spilling to disk) buffer, unsorted. Once the rows are in, an external merge sort takes place. Finally, these sorted records are appended to a B tree; since they are sorted, the tree does not need to be balanced (there is a flag set on the opcode which indicates this to the VM). This is a reasonable way to handle the situation, and consequently it happens in over an order of magnitude less time (or even less; I have not yet witnessed the end of the autoindex).

Now, to be clear, once there’s an index on this table, both 3.7.13 and 3.8.5 run the same, very snappy queries. Adding an index is the right thing to do regardless of how SQLite handles the above. There is also a pragma statement which disables the abominably slow autoindexer. The query above also clearly isn’t a useful one; I’ll just reiterate that it was based off of a much larger query that I pared down to exhibit the same behavior but resulting in fewer than 100 opcodes to step through & debug.

To summarize I ran into two main problems:

  1. An autoindex is suggested by the new QP, or by the old QP once the stats table has been created (with ANALYZE).
  2. The way an autoindex is created appears to be extraordinarily inefficient.

A potential solution to (1) would be to look at the big O time of the index creating and subsequently follow the results of that more closely. I’m not really sure what’s going on here, yet.

One potential solution to (2), if necessary, would be to use the same code that generated an index when one is explicitly created (that is, using a Sorter with an external sort before inserting into an index) instead of an inefficient AutoIndex.

I have to say, I love SQLite. The fact that it has a simple and fairly well-documented VM makes exploring it a joy. It was a lot of fun reading through the truly excellent documentation and the source was easy enough to read, especially given that I don’t write C these days.

I sent out a message to the SQLite mailing list with the above information, but never received a response. I would love to hear if and where I went wrong in the above analysis, or anything else relating to this post. As always, email (in the footer) is much appreciated.