[MlMt] text/plain?
Galen Menzel
galen.menzel at utexas.edu
Sun Apr 7 00:41:52 EDT 2019
Ok, here’s what’s going on:
1. The message is encoded in windows-1252, and contains non-breaking
spaces (encoded as the byte 0xA0).
2. Your terminal is using a different character encoding (probably
UTF-8), in which 0xA0 (as used) does not map to a character. (In UTF-8,
0xA0 is a “continuation character”, which is only valid as a
non-first byte in a multi-byte sequence.) In the absence of a valid
UTF-8 character code, the terminal gives up and displays the value of
the unmappable byte in angle brackets. However, “<A0>” is only how
the terminal displays the nbsp character. The .eml file itself does not
contain the four-character string “<A0>”.
Dealing with non-breaking spaces can be confusing, since they are
difficult to differentiate from normal spaces in many editors, and they
have different character codes in the common 8-bit encodings and UTF-8.
But they are usually encoded either as the single byte 0xA0 (in the
8-bit encodings) or as the two-byte sequence 0xC2A0 (in UTF-8). As
others have pointed out, a quick call to `tr '\240' ' '` to translate
the nbsps to normal spaces will often do the trick. If you happen to be
using perl, `use feature “unicode_strings”` will make
pattern-matching behave properly with nbsps (yes, even with strings from
windows-1252-encoded files!) — for example, it will make `\s` match
nbsps, which it normally doesn’t.
Best of luck with the scripting!
Galen
On 6 Apr 2019, at 4:27, Randy Bush wrote:
> i receive an email
>
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0)
> Gecko/20100101 PostboxApp/6.1.13
> MIME-Version: 1.0
> Content-Type: text/plain; charset=windows-1252; format=flowed
> Content-Transfer-Encoding: 8bit
> Content-Language: en-US
>
> the text has funny space characters that i see if i save the text to
> disk and look at it with less
>
> <A0>0.<A0><A0> flo....: 2.31 2018.11.03
>
> <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 1.<A0><A0> CLIMATE
> ACTION
> <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> * (N)ew
> (M)odify (D)elete..: N
>
> <A0><A0><A0><A0><A0><A0><A0><A0><A0><A0> 2. * NAME OF CLOUD:
> cumulus
>
> i presume the sender is thunderbird and they have created the text
> with
> some sort of windows encoding on a mac?
>
> how can i save the content as vanilla ascii text?
>
> randy
> _______________________________________________
> mailmate mailing list
> mailmate at lists.freron.com
> https://lists.freron.com/listinfo/mailmate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freron.com/pipermail/mailmate/attachments/20190406/ececa737/attachment-0001.html>
More information about the mailmate
mailing list