[MlMt] Searching for text of URL links

Bill Cole mmlist-20120120 at billmail.scconsult.com
Fri Mar 15 17:55:10 EDT 2019


On 15 Mar 2019, at 17:05, Chris Newman wrote:

> The IMAP standard requires implementation of a pure substring search, 
> but in practice most search indexing software toolkits only do 
> word-based search and don't support efficient substring search (most 
> can do reasonably efficient prefix search but not efficient suffix 
> search). So particularly for body searches, you need to search for a 
> substring that counts as a word to whatever indexing software is used 
> on the IMAP server you're using. Also search for a stop word (e.g., 
> 'and') may not work either (not indexing those words reduces index 
> size). The IMAP server I work on can either do IMAP compliant search 
> brute-force or do word-based indexed body search quickly and it's up 
> to the server admin to choose which to use. Given that many clients do 
> body search by default now, most admins of larger sites choose to use 
> the indexed word-based search.

All true but not really relevant for MailMate users.

MailMate's search is entirely client-side, using a custom index in 
~/Library/Application Support/MailMate/Database.noindex/ which is also 
what makes the "Smart Folder" feature possible.

> While it's possible to implement efficient indexed pure substring 
> search, that requires a significantly larger search index than 
> word-based search technologies, and it's not clear "free" email 
> services would be willing to pay for that extra storage when they can 
> just ignore the standard and provide word-based search cheaper (and I 
> don't recall being asked to provide such a feature by any customer).
>
> Also search indexing software is likely to drop any markup. So if the 
> URL is an HTML link rather than actually in the text of the message, 
> it may not be indexed (or searchable) at all.

Testing confirms that MM will find arbitrary substrings of URLs which 
are in plain text mail or in the content text of HTML mail but will NOT 
find any part of URL's that only exist in markup (i.e. href=) values.



-- 
Bill Cole
bill at scconsult.com or billcole at apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


More information about the mailmate mailing list