The SQLite tokeniser does not deal with scripts that do not use spaces for word breaking (CJK, Thai, etc), thus searching in those languages does not work well. This adds a custom SQLite tokeniser based on ICU that breaks words for all languages supported by that library, and uses NFKC_Casefold normalisation to handle normalisation, case folding, and dropping of ignorable characters. Fixes #121
19 lines
303 B
SQL
19 lines
303 B
SQL
--
|
|
-- Convert full-text search from FTS3/4 to FTS5
|
|
--
|
|
|
|
DROP TABLE IF EXISTS MessageSearchTable;
|
|
|
|
CREATE VIRTUAL TABLE MessageSearchTable USING fts5(
|
|
body,
|
|
attachments,
|
|
subject,
|
|
"from",
|
|
receivers,
|
|
cc,
|
|
bcc,
|
|
flags,
|
|
|
|
tokenize="geary_tokeniser",
|
|
prefix="2,4,6,8,10"
|
|
)
|