[MlMt] Experimental feature: Cleaning emails for “warnings”

Benny Kjær Nielsen mailinglist at freron.com
Mon Feb 3 08:50:35 EST 2025


Hi,

I went down a rabbit hole last week and implemented some basic support 
for removing certain strings from HTML and plain text when creating a 
reply (or forwarding) a message. I'm unlikely to do more with this 
feature right now (spent more time on it than I should), but I wanted to 
share it in case it's useful for anyone in its current state (other than 
the individual asking me about it). Maybe also to get some more use 
cases to better generalize it (some day -- not now).

A user had the problem that most of his replies would contain this: 
"External Message: Use Caution". When HTML was involved, it would be 
embedded in the original HTML. Its purpose, I assume, is to make a user 
aware whenever an email is received from outside an organization. I 
won't go into whether or not I think this is a good idea, but it's a 
problem that this text then goes into the reply as well. In particular 
for HTML which cannot be edited in MailMate without converting it to 
plain text.

In any case, it would be nice if it could be automatically removed, but 
this turned out to be non-trivial. Well, for plain text, it was fairly 
simple and it could be configured like this:

	defaults write com.freron.MailMate MmPlainTextStrippingRegexps \
	-array "^External Message: Use Caution"

For HTML, I tried to make it simple by only requiring the user to 
provide the inner-most part of the HTML and then MailMate would extend 
it to whatever makes sense. In this particular case, I still ended up 
with this:

	defaults write com.freron.MailMate MmHTMLStrippingRegexps
	 -array '"<span [^>]*>External Message:</span> Use Caution"' \
	        '"<td style=\"background:#ff0000;padding:5pt 2pt 5pt 
2pt\"></td>"'

The second string was to get rid of a red bar. It's not very intuitive 
and hard to configure.

Note that for an HTML email, both of the above settings are needed since 
text still needs to be removed from the plain text alternative (which in 
MailMate can be seen in the text editor part of the composer).

The full HTML (automatically removed by MailMate using the above) looked 
like this:

```html
<table border="0" cellspacing="0" cellpadding="0" align="left" 
width="100%">
<tbody>
<tr>
<td style="background:#ff0000;padding:5pt 2pt 5pt 2pt"></td>
<td width="100%" cellpadding="7px 6px 7px 15px" 
style="background:#ff000;padding:5pt 4pt 5pt 12pt;word-wrap:break-word">
<div style="color:#000000;"><span style="color:#000000; 
font-weight:bold;">External Message:</span> Use Caution
</div>
</td>
</tr>
</tbody>
</table>
<br>
```

After hinting this feature in the release notes, several users asked 
about it and I got some more examples. These show a lot of variation and 
it's tricky to come up with an approach that would work well in general. 
Some of them were a lot worse than the above. I ended up with these 
settings:

```
defaults write com.freron.MailMate MmPlainTextStrippingRegexps \
  -array '\\AThis email was sent to you by someone outside the 
University.\\nYou should only click on links or attachments if you are 
certain that the email is genuine and the content is safe.\\n' \
         '"\\AYou don.t often get email 
from[^<]*<https://aka.ms/LearnAboutSenderIdentification>\\nCaution: This 
email originated from outside of the organization. Please take care when 
clicking links or opening attachments. When in doubt, contact the ICT 
Department.\\n"' \
         '\\AExternal Message: Use Caution\\n'

defaults write com.freron.MailMate MmHTMLStrippingRegexps \
  -array '"(?m)^<div style=\"background-color:#fff2e6; border:2px dotted 
#ff884d\">.*?^</div>\\n"' \
         '"(?m)^<!-- Red Banner -->\\n<table.*</table>\\n<br>\\n"' \
         '"(?m)<table[^>]*>.*?(You don.t often get email from|This email 
originated from outside of the organization).*?</table>"' \
         '"(?m)^<!-- BaNnErBlUrFlE-BoDy-start -->\\n.*?^<!-- 
BaNnErBlUrFlE-BoDy-end -->\\n"'
```

Most users would, of course, only need to handle 1 variation of this.

This feature is highly experimental in the sense that I'm not even 
promising to fix it if it's broken ;) I'm more likely to implement it in 
a different way.

Hold down ⌥ when clicking “Check Now” in the Software Update 
settings pane to get the latest test release. The above works in r6220+.

While looking at one of the examples above, I also realized that at 
least one of these embedded-warning techniques could introduce a 
potential privacy-issue... I'll have to investigate that a bit more 
though.

-- 
Benny
https://freron.com/support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freron.com/pipermail/mailmate/attachments/20250203/76bd89f9/attachment.htm>


More information about the mailmate mailing list