[MlMt] text/plain?

Galen Menzel galen.menzel at utexas.edu
Sun Apr 7 00:41:52 EDT 2019


Ok, here’s what’s going on:

1. The message is encoded in windows-1252, and contains non-breaking 
spaces (encoded as the byte 0xA0).

2. Your terminal is using a different character encoding (probably 
UTF-8), in which 0xA0 (as used) does not map to a character. (In UTF-8, 
0xA0 is a “continuation character”, which is only valid as a 
non-first byte in a multi-byte sequence.) In the absence of a valid 
UTF-8 character code, the terminal gives up and displays the value of 
the unmappable byte in angle brackets. However, “<A0>” is only how 
the terminal displays the nbsp character. The .eml file itself does not 
contain the four-character string “<A0>”.

Dealing with non-breaking spaces can be confusing, since they are 
difficult to differentiate from normal spaces in many editors, and they 
have different character codes in the common 8-bit encodings and UTF-8. 
But they are usually encoded either as the single byte 0xA0 (in the 
8-bit encodings) or as the two-byte sequence 0xC2A0 (in UTF-8). As 
others have pointed out, a quick call to `tr '\240' ' '` to translate 
the nbsps to normal spaces will often do the trick. If you happen to be 
using perl, `use feature “unicode_strings”` will make 
pattern-matching behave properly with nbsps (yes, even with strings from 
windows-1252-encoded files!) — for example, it will make `\s` match 
nbsps, which it normally doesn’t.

Best of luck with the scripting!

Galen

On 6 Apr 2019, at 4:27, Randy Bush wrote:

> i receive an email
>
>     User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0)
>      Gecko/20100101 PostboxApp/6.1.13
>     MIME-Version: 1.0
>     Content-Type: text/plain; charset=windows-1252; format=flowed
>     Content-Transfer-Encoding: 8bit
>     Content-Language: en-US
>
> the text has funny space characters that i see if i save the text to
> disk and look at it with less
>
>     <A0>0.<A0><A0> flo....: 2.31 2018.11.03
>
>      <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 1.<A0><A0> CLIMATE 
> ACTION
>      <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> * (N)ew 
> (M)odify (D)elete..: N
>
>      <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 2. * NAME OF CLOUD: 
> cumulus
>
> i presume the sender is thunderbird and they have created the text 
> with
> some sort of windows encoding on a mac?
>
> how can i save the content as vanilla ascii text?
>
> randy
> _______________________________________________
> mailmate mailing list
> mailmate at lists.freron.com
> https://lists.freron.com/listinfo/mailmate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freron.com/pipermail/mailmate/attachments/20190406/ececa737/attachment-0001.html>


More information about the mailmate mailing list