[MlMt] Improving search performance?
Bill Cole
mmlist-20120120 at billmail.scconsult.com
Mon Jun 26 14:51:22 EDT 2023
On 2023-06-25 at 10:59:53 UTC-0400 (Sun, 25 Jun 2023 16:59:53 +0200)
Robert M. Münch <mailmate at lists.freron.com>
is rumored to have said:
[...]
>> One serious issue with indexing email is that email is highly
>> divergent in data structure, and while you can do a simple index for
>> basic standard mail metadata, "full text" and "all headers" search
>> for mail is a nightmare because real-world mail breaks almost every
>> rule theoretically governing it and it is not a simple matter to
>> determine what is or is not body text. Email typically arrives with
>> multiple alternative parts theoretically representing the same
>> message, possibly QP or B64 encoded and usually including one version
>> with HTML markup. And that markup can be bad, wrong, or even
>> intentionally malicious.
>
> Well, MM already handles all this, otherwise we couldn't use it as we
> do. Those parts are will known to MM.
I've had a bug open for quite a while regarding a MM parsing problem
with pure text messages generated by automated tools.
I don't know what the root cause of that is, but I am certain that Benny
does not have all the arcana handled.
>> Very large mail stores are inherently tough to search.
>
> After pre-processing all the mail mess, I don't think so. Searching in
> Gmail is fast. MM is already much better than other clients.
I haven't looked in a long while but last I checked, GMail could not
search on arbitrary headers. Have they fixed that?
That's a huge part of the scaling problem. There are not a lot of people
who really use that feature, but we do value it highly. The extremely
long tail of headers and full-text tokens that only appear in a small
number of messages makes mail particularly hard to search efficiently if
you include all the garbage spam full of 'hashbusters' and such.
> IMO the use-case search *1+ million emails as fast as possible* is
> just not in scope for most of the clients.
Right, because most users do not need that. I don't know that any mail
client does it as well as MM with the same search capabilities.
--
Bill Cole
bill at scconsult.com or billcole at apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
More information about the mailmate
mailing list