[MlMt] ideas for analysing message subjects?

Steven M. Bellovin smb at cs.columbia.edu
Wed Nov 19 15:50:52 EST 2025


If you have formail available (https://manpages.ubuntu.com/manpages/resolute/en/man1/formail.1.html), use it—it will handle Subject: split across multiple lines, ignore something that looks like a Subject: in the body, etc.

On 19 Nov 2025, at 15:41, Glenn Parker via mailmate wrote:

> On 19 Nov 2025, at 11:57, Glenn Parker via mailmate wrote:
>
>> On 18 Nov 2025, at 17:21, Bill Cole wrote:
>>
>>> It strikes me as a very trivial "mission." An exported mailbox is just text, so if you have an exported mailbox named 2025.mbox:
>>>
>>>   grep '^Subject: ' 2025.mbox | grep -o ' .*'
>>>
>>> Will give you all the subjects without the leading "Subject:" tag
>>
>> A little bit fancier. Strips out the filename and avoids a potential shell error for large directories:
>>
>> cd [message-folder]
>> find . -name '*.eml' -exec grep -h -m 1 '^Subject: ' {} + | sed 's/Subject: //'
>>
>> Append the following to remove duplicate subjects:
>> | sort -su
>
> Ah, but this will truncate long subject lines that are continued over multiple lines.
>
> Farewell grep, enter sed:
>
>     find . -name '*.eml' -exec \
>       sed -n -e '/^Subject: /!d' \
>       -e 's/^Subject: //' \
>       -e ':x' -e 'N' -e 's/\n  */ /g' -e 'tx' -e 'P' -e 'b' {} \;
>
> If you have a recent version of sed, i.e. gnu-sed aka “gsed”, this can be written *somewhat* more legibly as:
>
>     find . -name '*.eml' -exec \
>       gsed -n -e '/^Subject: /!d
>       s/^Subject: //
>       :x; N; s/\n /  */g; tx; P; b' {} \;
>
> I’m not going to attempt to handle quoted-printable encoding in subject lines. :-)
>
> Glenn P. Parker
> glenn.parker at comcast.net
> _______________________________________________
> mailmate mailing list
> Unsubscribe: https://urldefense.com/v3/__https://lists.freron.com/listinfo/mailmate__;!!BDUfV1Et5lrpZQ!SkcQcfHO7NftxJvIlsz1KVJbmKXGvkwvGs1gnIA5V2fnCAxb3JYYBtYSJvvDs3fxilouep8XQi-bnmSnl7F_YCw$


        —Steve Bellovin, https://www.cs.columbia.edu/~smb


More information about the mailmate mailing list