[MlMt] ideas for analysing message subjects?

Glenn Parker glenn.parker at comcast.net
Wed Nov 19 15:41:45 EST 2025


On 19 Nov 2025, at 11:57, Glenn Parker via mailmate wrote:

> On 18 Nov 2025, at 17:21, Bill Cole wrote:
>
>> It strikes me as a very trivial "mission." An exported mailbox is 
>> just text, so if you have an exported mailbox named 2025.mbox:
>>
>>   grep '^Subject: ' 2025.mbox | grep -o ' .*'
>>
>> Will give you all the subjects without the leading "Subject:" tag
>
> A little bit fancier. Strips out the filename and avoids a potential 
> shell error for large directories:
>
> cd [message-folder]
> find . -name '*.eml' -exec grep -h -m 1 '^Subject: ' {} + | sed 
> 's/Subject: //'
>
> Append the following to remove duplicate subjects:
> | sort -su

Ah, but this will truncate long subject lines that are continued over 
multiple lines.

Farewell grep, enter sed:

     find . -name '*.eml' -exec \
       sed -n -e '/^Subject: /!d' \
       -e 's/^Subject: //' \
       -e ':x' -e 'N' -e 's/\n  */ /g' -e 'tx' -e 'P' -e 'b' {} \;

If you have a recent version of sed, i.e. gnu-sed aka “gsed”, this 
can be written *somewhat* more legibly as:

     find . -name '*.eml' -exec \
       gsed -n -e '/^Subject: /!d
       s/^Subject: //
       :x; N; s/\n /  */g; tx; P; b' {} \;

I’m not going to attempt to handle quoted-printable encoding in 
subject lines. :-)

Glenn P. Parker
glenn.parker at comcast.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freron.com/pipermail/mailmate/attachments/20251119/afdc4ea4/attachment.htm>


More information about the mailmate mailing list