I’m annoyed

Why can't Thunderbird detect the character encoding?

Why can't Thunderbird detect the character encoding of a message automatically? As a test, I was sent a gmail message containing the upper case Greek letters. Since UTF-8 encoding was not enabled in gmail, gmail encoded the the message in ISO-8859-7 (Greek variant of 8859-1); the fact that it was encoded in 8859-7 was evident if the "view source" option was selected, and my iPodTouch has no trouble displaying it properly. But Thunderbird apparently doesn't bother to check this and displayed the entire message as "diamond with question mark" symbols; I tried the usual techniques of viewing as "Western", UTF-8, etc, but never thought of 8859-7. When I manually selected 8859-7, the message displayed properly, but I shouldn't have had to go through all that. If the iPodTouch can figure out the encoding (as can Outlook), why can't Thunderbird?
4 people have
this question
+1
Reply
  • > [...] encoded in 8859-7 was evident if the "view source"
    > option was selected [...]

    Does this statement mean there was a "Content-Type:" header with "charset=iso-8859-7" specified ? If so, the TB _should_ have displayed it using that charset. If there was no "Content-Type:" header, or the header contained no "charset=" specification, then TB will use the "Character Encoding" you have specified (via "View->Character Encoding").
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hey!

    Sorry for advertising my thread everywhere, but this may solve your problem.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • In addition to the "View->Character Encoding" method of selecting the default charset for an individual message, it is also possible to specify a default charset to be applied to all incoming messages that do not contain a "charset=" specification:
    • "Preferences" (Mac/Linux) / "Options" (Win)

    • "Display"

    • Under "Fonts", click the "Advanced..." button (to the right side)

    • Under "Character Encodings", use the drop-down menu adjacent to "Incoming Mail:" to select the default encoding to be applied to incoming messages without a "charset=" specification
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I’m irritated
    Hmm, even if I follow the about:config way to set mailnews.force_charset_override to the default, I still get problems. Mails I receive are either UTF-8 or 8859-15, and if I set the default enconding to one of these, half of the mails contain ill-decoded umlauts, quote signs etc.Why can't TB detect this automatically? And why doesn't it remember which encoding I did choose manually for a specific message?

    My TB 8.0 was updated incrementally, coming from TB 1.5 or TB 2 (don't remember which) on a Win XP SP 3 installation.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • TB can detect it automatically under these conditions:


    • The email is declared correctly (charset)

    • mailnews.force_charset_override is set to false

    • Right click menu of the related folder > properties > General Information, the bottom box is unticked

    • > View > character encodings can set to "off" or "Universal". Last option is only needed when the email contains more than one different encoding.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • @Bernd S. Thanks for the tip! That thing was bugging me for ever and it's finally solved!
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Thank you for the help. mailnews.force_charset_override was causing my problem... (who knows, a bad plugin?)
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • @Bernd: Thanks for your explanation. I recently stumbled upon an email which did not have any Content-Type headers, so it is impossible to detect automatically.

    However, the thing I don't understand, is that why cannot TB show the Subject line correctly in the message list, when I explicitly choose a character encoding which displays the message correctly elsewhere? What makes the message list view so special?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • I think it's a problem because the subject is (should be) encoded in quoted printable.
    I could told you more about it if I could read the complete message source.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • @Bernd:

    Thanks for your response! I'm not all up to the quoted-printables and all that, but here's the headers of the message I'm talking about. I've rid all identifying info, but otherwise it's complete and accurate.

    Delivered-To: janne.jokitalo@gmail.com
    Received: by xxx.xxx.xxx.xxx with SMTP id m12csp98384ane;
    Tue, 20 Mar 2012 12:54:21 -0700 (PDT)
    Received: by xxx.xxx.xxx.xxx with SMTP id e1mr509324lbn.42.1332273260605;
    Tue, 20 Mar 2012 12:54:20 -0700 (PDT)
    Return-Path:
    Received: from foobar.invalid (foobar.invalid. [xxx.xxx.xxx.xxx])
    by mx.google.com with ESMTP id si6si810376lab.57.2012.03.20.12.54.20;
    Tue, 20 Mar 2012 12:54:20 -0700 (PDT)
    Received-SPF: pass (google.com: best guess record for domain of nobody@foobar.invalid designates xxx.xxx.xxx.xxx as permitted sender) client-ip=xxx.xxx.xxx.xxx;
    Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of nobody@foobar.invalid designates xxx.xxx.xxx.xxx as permitted sender) smtp.mail=nobody@foobar.invalid
    Received: by foobar.invalid (Postfix, from userid 65534)
    id 8E11F1AA007; Tue, 20 Mar 2012 21:54:19 +0200 (EET)
    To: janne.jokitalo@gmail.com
    Subject: IT-alan resursoinnin ja kasvujohtamisen suunnann�ytt�j�t
    From: a.person@foobar.invalid
    Message-Id:
    Date: Tue, 20 Mar 2012 21:54:19 +0200 (EET)
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • When the message source contains such signs, there is nothing you could do.
    Tell the sender to use an email client.
    This email seems to be sent from a bad webmail interface or third party software.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Yes, I already understood why it wasn't properly shown automagically. As I already stated, I wondered why it isn't shown correctly in the _message list_, when I specifically choose an encoding which makes it look correct in the message pane.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Thank you. "mailnews.force_charset_override" was my problem, too. I set it to false and now things are working as expected *and* desired.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hello, I'm having same problem with "diamonds" in massge list. The message is:

    Delivered-To: passshok@gmail.com
    Received: by 10.58.94.6 with SMTP id cy6csp536006veb;
    Thu, 20 Sep 2012 17:14:36 -0700 (PDT)
    Received: by 10.152.104.44 with SMTP id gb12mr2847834lab.29.1348186476262;
    Thu, 20 Sep 2012 17:14:36 -0700 (PDT)
    Return-Path:
    Received: from vm1265.majordomo.ru ([78.108.90.102])
    by mx.google.com with ESMTPS id ee4si9772864lbb.45.2012.09.20.17.14.35
    (version=TLSv1/SSLv3 cipher=OTHER);
    Thu, 20 Sep 2012 17:14:36 -0700 (PDT)
    Received-SPF: softfail (google.com: domain of transitioning gruz-trans@yandex.ru does not designate 78.108.90.102 as permitted sender) client-ip=78.108.90.102;
    Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning gruz-trans@yandex.ru does not designate 78.108.90.102 as permitted sender) smtp.mail=gruz-trans@yandex.ru
    Received: from apache by vm1265.majordomo.ru with local (Exim 4.77)
    (envelope-from )
    id 1TEqt9-00064F-Er
    for passshok@gmail.com; Fri, 21 Sep 2012 04:14:35 +0400
    To: passshok@gmail.com
    Subject: КЛЕВЕР-ПЛЮС - Ваш заказ 9530-91/21-09-12 успешно оформлен
    MIME-Version: 1.0
    From:
    Content-Type: text/plain; charset=windows-1251
    X-Mailer: PHP/
    Message-Id:
    Date: Fri, 21 Sep 2012 04:14:35 +0400

    Доброго времени!
    --------------------------------------------------------
    Спасибо за покупку в нашем интернет-магазине 'КЛЕВЕР-ПЛЮС'
    Наши менеджеры свяжутся с вами по координатам,
    оставленным в форме заказа.
    ...

    Subject in message frame is displayed correctly, but in message list it's not. How can I fix this?

  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Content-Type: text/plain; charset=windows-1251


    The charset is correctly declared but there is missing an important part:
    with which method it has been encoded. So the declaration is incomplete.
    The complete declaration should be e.g.:

    Content-Type: text/plain; charset=windows-1251
    Content-Transfer-Encoding: 8bit (or 7bit or quoted printable or base64 etc).
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • If the email subject contains any non-ASCII characters then they must be encoded properly into base 64 so that Thunderbird can interpret the same properly without any settings like below :

    =?UTF-8?B?< subject in Base 64 format >?=

    And in case the subject length is larger in length then we should also make sure to encode the subject properly by reading it in chunks of 8 and starting and terminating the encoded chunks properly with '=?UTF-8?B?' and '?='

    Sample code that worked for me in PL/SQL:

    v_subject := 'アフターセールスサービスと保証 アフターセールスサ';
    l_pos := 1;

    WHILE l_pos < = LENGTH(v_subject) LOOP
    l_temp := l_temp || '=?UTF-8?B?' || utl_raw.cast_to_varchar2(UTL_ENCODE.BASE64_ENCODE(UTL_RAW.CAST_TO_RAW(SUBSTR(v_subject, l_pos, 8)))) ||'?=';
    L_POS := L_POS + 8;
    END LOOP;
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • then they must be encoded properly into base 64

    That is not correct.
    Thunderbird uses base 64 only, if an email contains attachments or other embedded files.
    This should be the declaration in the source code of an email without attachments and HTML-code.:
    Content-Type: text/plain; charset=utf-8
    Content-Transfer-Encoding: Quoted-printable


    The charset must set by the sender to UTF-8 or Unicode.
    And in Thunderbird > view > Character encodings > Auto-detect > Unversal.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited sad, anxious, confused, frustrated kidding, amused, unsure, silly indifferent, undecided, unconcerned