See the ticket (comment #2) for more information on the thinking and
strategy here, but in a nutshell this will remove from the Geary
database all emails no longer accessible via any folder and not seen
on the server in over 30 days. It also deletes those messages
attachment(s) and removes any empty directories in the attachment/
directory to prevent clutter. If enough messages are garbage
collected, Geary will vacuum the database at startup, which will
lower its disk footprint and reduce fragmentation, potentially
increasing performance.
This introduces a new full-text search algorithm that attempts to
curb the effects of overstemming in the Porter Snowball stemmer.
The FTS table will be regenerated with this update.
The crux of this new algorithm is a configurable heuristic that
reduces stemmed matching. The configuration is not available via the
UI (I suspect it will only confuse users) but can be changed by power
users via GSettings. More information is available at:
https://wiki.gnome.org/Apps/Geary/FullTextSearchStrategy
With the database improvements over this cycle, testing shows
that we can now switch to NORMAL synchronous mode with little to
no performance loss but better database safety.
Attachments without Content-Disposition are now generated and shown
in the client. This requires a database upgrade as well as rescanning
all messages to generate the previously missing attachments.
In addition, this upgrade now stores the attachments' Content-ID in
the database. This makes it much easier for the client to associate
a particular MIME section in the RFC822 message with an attachment in
the database and on disk.
The code for distinguishing between a new database and an existing
one being upgraded was faulty, causing the progress monitor to be
started/stopped when it should've have been.
1) Use docid instead of id in search table.
We had previously included an 'id INTEGER PRIMARY KEY' column in the
MessageSearchTable, assuming it would get the same rowid alias treatment
as it does in non-FTS tables. That assumption was wrong: it was being
created as a FTS column. This fixes it so we use docid everywhere.
To fix the old incorrect docid values, we simply blow away the search
table and let the natural search table population process, which now has
the correct docid insertion code, fix the problem.
This also removes the id column from the search table creation SQL, but
this will only affect new users. Upgraders will see an empty, vestigal
id column in their search table. Since SQLite doesn't easily let you
remove columns, it's just easier to ignore the column than go through
all the work to fix it.
2) Do as many rowid lookups as possible in batches, instead of doing
them individually in loops. This speeds up working with large sets of
email.
3) Rejigger indices on the MessageLocationTable to make certain queries
faster. This creates a new covering index in particular for the email
prefetcher, which previously had to sort using a temp table. The new
index should work in the general case too, as we should never be looking
at ordering without folder_id (and since folder_id comes first, it works
as an index on just folder_id, too).
4) For bonus measure, log all slow queries (> 1s execution time) to
debug output.
Closes: bgo #725929
We had a bug in our DateTime to time_t conversion logic where all
time_ts would end up in the year 3800. This fixes that, and repopulates
the internaldate_time_t column with the new, correct time_t values.
Closes: bgo #724335
We've had numerous bugs due to improper MIME comparisons and dealing
with Content-Type and Content-Disposition (or their lack of presence
in a message). Now the Engine offers MIME classes that better deal
with these issues without exporting the GMime structures, which
are not as easy to manage and don't offer some of the things that
have bitten us in the past (such as case-insensitive comparisons).
This speeds up startup time immensely, probably due it matching the
the filesystem's or Linux memory mgmt's page size. It's also
expected that this will improve database performance in other ways,
as the default was 1K, meaning potentially more I/O than necessary
for standard operations.
Db.VersionedDatabase.open_background() will do open() in background
thread. ImapDB.Database now uses upcalls to schedule progress
monitor updates and a polled callback to pump the event loop.
The database is now tested for corruption at startup. If a problem
is detected, one of three error messages are displayed in a dialog box
(corruption, permission problems, and "general error").
Rather than attempt to be selective, there's enough changed here that we might as well
blow away the search index and let the indexer start afresh. Future tweaks to the
search index might need to be more selective.
Mass email creation is taking far longer than it should (since the
vector expansion causing it isn't writing email bodies or headers,
merely inserting a row into two tables and writing small metadata to
one of them). This patch breaks up vector expansion more than before
and turns off SYNCHRONOUS mode.
Conflicts:
sql/version-010.sql
src/client/folder-list/folder-list-folder-entry.vala
src/engine/rfc822/rfc822-message.vala
Also, I had to manually fix some compile errors introduced due to
interfaces changing on master.
This caps the search results at 1000 emails, due to our unfortunate
requirement of constructing an object for each search result. A better
way to proceed here would be to do the search only as items were loaded
in the SearchFolder, but that gets complicated when the search phrase
gets updated.
We use the list of preferred languages for the user at the time of
search table creation to pick the most relevant stemming algorithm for
our search tokenizer. If we don't find a stemmer that matches any
preferred language, we use the English stemming algorithm as the
default.
This is a limited implementation, so please backup your database before
running this search feature branch from now on as we may change things.
It's using a Unicode Snowball stemming tokenizer available from
https://github.com/littlesavage/sqlite3-unicodesn, also handily
available in src/sqlite3-unicodesn in Geary. If you want to look at the
search tables on the command line, cd into the unicodesn source folder,
run make and make install, then load sqlite3 like:
sqlite3 -cmd '.load unicodesn.sqlext' /path/to/geary.db
This reverts the retry strategy back to SQLite's built-in handler,
but with one difference: concurrency is turned off on async
transactions by using only one thread. Concurrency with SQLite
will need to be readdressed later.
Testing at home showed prior commit still left a lot of locking
problems on a slow machine with a fragmented database. This locking
mechanism is a little better, counting down rather than up, giving a
lot of time initially for a write to commit, but the more the locked
transaction waits, the sooner it retries to get in ahead of later
transactions.
Previously, we were taking folder names as they came off the wire.
Turns out IMAP specifies that folder names with 8 bit code points are
encoded in a crazy scheme unique to IMAP. Now, we properly decode that
scheme to the correct UTF-8 folder names to be displayed to the user.
There's also now a database upgrade path that converts all existing
mailboxes to the decoded version, so your existing database should just
keep working.
The problem was that the database busy timeout was too short. Charles'
Geary folder required ~1700 updates, which was longer than the 1 second
timeout allowed.
It is done.
Initial implementation of the new database subsystem
These pieces represent the foundation for ticket #5034
Expanded transactions, added VersionedDatabase
Further expansions of the async code.
Moved async pool logic into Database, where it realistically
belongs.
Further improvements. Introduced geary-db-test.
Added SQL create and update files for Geary.Db
version-001 to version-003 are exact copies of the SQLHeavy scripts
to ensure no slight changes when migrating. version-004 upgrades
the database to remove the ImapFolderPropertiesTable and
ImapMessagePropertiesTable, now that the database code is pure
IMAP.
When we support other messaging systems (such as POP3), those
subsystems will need to code their own database layers OR rely on
the IMAP schema and simply ignore the IMAP-specific fields.
ImapDB.Account fleshed out
ImapDB.Folder is commented out, however. Need to port next.
ImapDB.Folder fleshed out
MessageTable, MessageLocationTable, and AttachementTable are now
handled inside ImapDB.Folder.
chmod -x imap-db-database.vala
OutboxEmailIdentifier/Properties -> SmtpOutboxEmailIdentifier/Properties
Moved SmtpOutboxFolderRoot into its own source file
SmtpOutboxFolder ported to new database code
Move Engine implementations to ImapDB.
Integration and cleanup of new database code with main source
This commit performs the final integration steps to move Geary
completely over to the new database model. This also cleans out
the old SQLHeavy-based code and fixes a handful of small bugs that
were detected during basic test runs.
Moved Outbox to ImapDB
As the Outbox is tied to the database that ImapDB runs, move the
Outbox code into that folder.
Outbox fixes and better parameter checking
Bumped Database thread pool count and made them exclusive
My reasoning is that there may be a need for a lot of threads at
once (when a big batch of commands comes in, especially at
startup). If performance looks ok, we might consider relaxing
this later.