how can I get faster FTS4 query results ordered by a field in another table?
Background
I'm implementing full-text search over a body of email messages stored in
SQLite, making use of its fantastic built-in FTS4 engine. I'm getting some
rather poor query performance, although not exactly where I would expect.
Let's take a look.
Representative schema
I'll give some simplified examples of the code in question, with links to
the full code where applicable.
We've got a MessageTable that stores the data about an email message (full
version spread out over several files here, here, and here):
CREATE TABLE MessageTable (
id INTEGER PRIMARY KEY,
internaldate_time_t INTEGER
);
CREATE INDEX MessageTableInternalDateTimeTIndex
ON MessageTable(internaldate_time_t);
The searchable text is added to an FTS4 table named MessageSearchTable
(full version here):
CREATE VIRTUAL TABLE MessageSearchTable USING fts4(
id INTEGER PRIMARY KEY,
body
);
The id in the search table acts as a foreign key to the message table.
I'll leave it as an exercise for the reader to insert data into these
tables (I certainly can't give out my private email). I have just under
26k records in each table.
Problem query
When we retrieve search results, we need them to be ordered descending by
internaldate_time_t so we can pluck out only the most recent few results.
Here's an example search query (full version here):
SELECT id
FROM MessageSearchTable
JOIN MessageTable USING (id)
WHERE MessageSearchTable MATCH 'a'
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
On my machine, with my email, that runs in about 150 milliseconds, as
measured via:
time sqlite3 test.db <<<"..." > /dev/null
150 milliseconds is no beast of a query, but for a simple FTS lookup and
indexed order, it's sluggish. If I omit the ORDER BY, it completes in 10
milliseconds, for example. Also keep in mind that the actual query has one
more sub-select, so there's a little more work going on in general: the
full version of the query runs in about 600 milliseconds, which is into
beast territory, and omitting the ORDER BY in that case shaves 500
milliseconds off the time.
If I turn on stats inside sqlite3 and run the query, I notice the line:
Sort Operations: 1
If my interpretation of the docs about those stats is correct, it looks
like the query is completely skipping using the
MessageTableInternalDateTimeTIndex. The full version of the query also has
the line:
Fullscan Steps: 25824
Sounds like it's walking the table somewhere, but let's ignore that for now.
What I've discovered
So let's work on optimizing that a little bit. I can rearrange the query
into a sub-select and force SQLite to use our index with the INDEXED BY
extension:
SELECT id
FROM MessageTable
INDEXED BY MessageTableInternalDateTimeTIndex
WHERE id IN (
SELECT id
FROM MessageSearchTable
WHERE MessageSearchTable MATCH 'a'
)
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
Lo and behold, the running time has dropped to around 100 milliseconds
(300 milliseconds in the full version of the query, a 50% reduction in
running time), and there are no sort operations reported. Note that with
just reorganizing the query like this but not forcing the index with
INDEXED BY, there's still a sort operation (though we've still shaved off
a few milliseconds oddly enough), so it appears that SQLite is indeed
ignoring our index unless we force it.
I've also tried some other things to see if they'd make a difference, but
they didn't:
Explicitly making the index DESC as described here, with and without
INDEXED BY
Explicitly adding the id column in the index, with and without
internaldate_time_t ordered DESC, with and without INDEXED BY
Probably several other things I can't remember at this moment
Questions
100 milliseconds here still seems awfully slow for what seems like it
should be a simple FTS lookup and indexed order.
What's going on here? Why is it ignoring the obvious index unless you
force its hand?
Am I hitting some limitation with combining data from virtual and regular
tables?
Why is it still so relatively slow, and is there anything else I can do to
get FTS matches ordered by a field in another table?
Thanks!
No comments:
Post a Comment