[MlMt] ideas for analysing message subjects?
Glenn Parker
glenn.parker at comcast.net
Wed Nov 19 15:41:45 EST 2025
On 19 Nov 2025, at 11:57, Glenn Parker via mailmate wrote:
> On 18 Nov 2025, at 17:21, Bill Cole wrote:
>
>> It strikes me as a very trivial "mission." An exported mailbox is
>> just text, so if you have an exported mailbox named 2025.mbox:
>>
>> grep '^Subject: ' 2025.mbox | grep -o ' .*'
>>
>> Will give you all the subjects without the leading "Subject:" tag
>
> A little bit fancier. Strips out the filename and avoids a potential
> shell error for large directories:
>
> cd [message-folder]
> find . -name '*.eml' -exec grep -h -m 1 '^Subject: ' {} + | sed
> 's/Subject: //'
>
> Append the following to remove duplicate subjects:
> | sort -su
Ah, but this will truncate long subject lines that are continued over
multiple lines.
Farewell grep, enter sed:
find . -name '*.eml' -exec \
sed -n -e '/^Subject: /!d' \
-e 's/^Subject: //' \
-e ':x' -e 'N' -e 's/\n */ /g' -e 'tx' -e 'P' -e 'b' {} \;
If you have a recent version of sed, i.e. gnu-sed aka “gsed”, this
can be written *somewhat* more legibly as:
find . -name '*.eml' -exec \
gsed -n -e '/^Subject: /!d
s/^Subject: //
:x; N; s/\n / */g; tx; P; b' {} \;
I’m not going to attempt to handle quoted-printable encoding in
subject lines. :-)
Glenn P. Parker
glenn.parker at comcast.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freron.com/pipermail/mailmate/attachments/20251119/afdc4ea4/attachment.htm>
More information about the mailmate
mailing list