Character Encoding Problems on Message Boards (Sept. 9 2014)

  • 4
  • Problem
  • Updated 4 years ago
  • Solved
Archived and Closed

This conversation is no longer open for comments or replies and is no longer visible to community members. The community moderator provided the following reason for archiving: IMDb boards

Today (Sept. 9 2014) I've noticed the following problems involving incorrect character-encoding in the message board post editor and in board signatures.


When we preview or edit a post that contains any of various extended UTF-8 characters, the post editor may replace such characters with incorrectly encoded representations in the text input box.

Also, when we edit an existing post that contains a less-than (<) or greater-than (>) character, the post editor may incorrectly replace such characters with encoded representations in the text input box.


In saved signatures that contain extended UTF-8 characters, the system incorrectly replaces such characters with encoded representations.

Also, in saved signatures that contain an ampersand (&) or an apostrophe (') or the less-than (<) or greater-than (>) character, the system incorrectly replaces such characters with encoded representations.


IMDb developers: Please investigate. Thanks.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes

Posted 6 years ago

  • 4
Photo of Dan Dassow

Dan Dassow, Champion

  • 16659 Posts
  • 18784 Reply Likes
Hi Lucus Anon,

Thank you for reporting this problem.

This has been a long standing problem on IMDb. In general, IMDb does not handle extended UTF-8 characters well on board signatures and bios.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes
With apologies for any redundancy, let me repeat and clarify key points regarding new problems that arose on Sept. 9 2014. Some of the new issues affect the main body text of previewed or edited posts:

When we preview or edit the main body text of a post containing extended UTF-8 characters, the post editor replaces such characters with incorrectly encoded representations in the text input box. This appears to be a new problem that arose on September 9 2014.

Also, when we edit the main body text of an existing post that contains a less-than (<) or greater-than (>) character, the post editor incorrectly replaces such characters with encoded representations in the text input box. This too appears to be part of the new problems that arose on Sept. 9 2014.

^ Note that those first portions of my bug report pertain to the main body text of posts, not signatures.

____________________

Regarding the signature related issues that I also reported above, we have further established that certain specific issues (such as I described) appear to be new (or, at least, worsened) as of Sept. 9 2014. On that date, the system seemingly began to incorrectly modify some users' existing saved signatures, replacing certain characters with incorrectly encoded representations. Newly edited signatures are likewise affected by these problems.

I acknowledge that some known problems existed before, but I believe that the issues that I'm reporting are new (or at least worsened) as of Sept. 9 2014.
(Edited)
Photo of Dan Dassow

Dan Dassow, Champion

  • 16659 Posts
  • 18784 Reply Likes
Hi Lucus Anon,

Thanks for the clarification. What you wrote above is not redundant, it is simply an indication that a long standing problem has become worse.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
Yes, character encoding is a long-standing problem, but it's been over a year since it was as bad as I've seen it today.  Some posters are just leaving the garbage in to draw attention to the problem.
Photo of Nasro Subari

Nasro Subari

  • 12 Posts
  • 8 Reply Likes
As of today, this kind of invalids my nice signature:

--
Grammar:
The difference between knowing your sh**
and knowing you&#x27;re sh**.
(Edited)
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Thanks for bringing this up.  We've just rolled out a fix for the editor that should better handle all characters.  As stated previously, some weirdness might remain when you edit posts because the data from those was stored incorrectly when the post was originally made -- if they are miscoded in the input box then you should be able to fix them up and have them stay fixed.

The boards bio/signature might still have some issues; we'll be looking at those next.

Thanks for your patience with this.  We are committed to getting this fixed permanently, however we're going to be breaking things along the way.  Please let us know if there are still some issues.
Photo of Patrick Murphy

Patrick Murphy

  • 1 Post
  • 0 Reply Likes
Glad to know it wasn't my computer doing it, but it does make it tough for those of us who use sig files on IMDb. Hopefully it can be fixed soon. Thanks.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Woah, there's some knock-on problems.  Standby.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Ok, the knock-on issue has been fixed.  Specifically, the preview/edit/post cycle now no longer mangles "special" characters.  However (again) existing posts may be corrupted; if you edit/fix them then they should stay fixed.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes
Thanks for working on these issues!

I understand that work is still in progress toward fixing problems involving incorrectly re-encoded characters in signatures.

As you've noted, at least one of the mentioned problems affecting message post body text has now been fixed. As you've noted, "the preview/edit/post cycle now no longer mangles" UTF-8 extended characters. Thanks for fixing that!

However, at the time of this reply, the other mentioned issue affecting message post body text may still remain to be fixed:

"When we edit the main body text of an existing post that contains a less-than (<) or greater-than (>) character, then the post editor may incorrectly replace such characters with encoded representations in the text input box."
Photo of Shenika Chapman

Shenika Chapman

  • 14 Posts
  • 0 Reply Likes
That thing was knocking; I don't have a bed. I can't get me rest to help fix it.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes
markfilipak has posted further information about
the current and recurring problems with signatures.
Please see his report here:

https://getsatisfaction.com/imdb/topics/bug-in-boards-signature
.
(Edited)
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
We've just rolled out a fix for the "editing and existing post with < or > characters" bug.

We are also testing a candidate fix for the signature issues.
Photo of Shenika Chapman

Shenika Chapman

  • 14 Posts
  • 0 Reply Likes
It will be good educational knowledge, Thank you!
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Signature encoding bug should be fixed on Monday.  Will post here when it's live.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes
Thanks. Starting today (Monday), we can re-edit our board signatures and replace the unwanted codes with the original desired characters, such as the apostrophe etc.

...

Separately, here is another new problem that I've noticed since the weekend:

If I use any extended UTF-8 characters in the subject-line of a post on any IMDb message board, the extended characters in the subject line may appear correct in the post-editor and preview -- but the extended characters in the subject-line may be incorrectly displayed after I post the message and view it on the board.

(That new problem affects new posts created today, and also affects older existing posts that already have extended UTF-8 characters in their subject-lines. For new posts and old posts alike, extended UTF-8 characters are now displayed incorrectly in subject-lines.)
(Edited)
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Thanks for the report.  I'll look into this next.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
This problem has gotten worse since you posted this.  Now diacritics in the [link=?] tag text are going bad when you edit.  There is a definite problem with previews not being WYSIWYG all the way through to the corruption in the final post.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Thanks for the report.  Bug found; the fix will be online in a day or two.
Photo of Shenika Chapman

Shenika Chapman

  • 14 Posts
  • 0 Reply Likes
I need a technical fix; that would solve the problem.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Signature encoding is now fixed.  Will be working next on the encoding issues with post titles.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
I have been posting on an NYT blog for many years and have come to know what it will accept in the way of HTML.  The blog is far more restrictive than the IMDb message system.  No tags are allowed, not even markup tags.  What are allowed are the codes that begin with an ampersand.  IMDb could allow them without much risk to the integrity of the blog. The only replacement I now see necessary would be "& " replaced with "&amp; "  Allowing this conversion should clear up the signature problem that persists.  I've seen quite a few corrupted signatures that are left that way, apparently in the form of a protest.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
This may be progress.  Now I see corruption in the preview as well, but it is a different type of corruption than was there before.  If the final product is corrupt, I don't believe you guys know what you're doing if the preview is not WYSIWYG.  Having the exact same corruption in the preview is definitely preferable to having a clean preview and a corrupted post.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
I see some inconsistent corruption in your latest post, depending on if it's expanded or not.  However, I can't reproduce the corruption in a reply.  How did you enter the characters that got corrupted?
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Also: are you seeing corruption in the body of the post or just the subject?
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
Today, I have seen corruption both in the subject and in the body.  The last time, I just saw it in the subject:

Zúñiga=SOO-nyee-gah

now displays as "Z��iga=SOO-nyee-gah" in the subject

[link=nm0959250] now displays the correct link text Rosario Zúñiga (This is not consistent however.)

before today, both displayed corruption like "ĉâ" in the final post.  Previews were not corrupt until today.

I believe you will have to test subject, text and signature together to eventually get the encoding right.  Without consistent WYSIWYG, you are definitely not there yet.   I am always leery of previews that are not WYSIWYG.  I have seen problems from that on other blogs also.

If you go back to my profile, you'll see that the link text there is now also corrupted.  It wasn't before today.

The most irritating message I got was that the corrupted characters were not supported when the �� characters showed up in the preview.  Posting from the textbox allowed them to go through corrupted.  What a mess! And there were few problems for over a year.
(Edited)
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
Now, I see corruption in the preview, but not the final post!  Why are all the conflicting changes being made live?  Maybe the fact that it's the only way you can do it is part of the problem.
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
We've just rolled another round of fixes live onto the site.  Please let me know if there continue to be issues.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
There are retro issues.  I disagree with you that the ampersand codes should not be converted.  Most of the encoding protests I've seen are with posters leaving that type of corruption, especially in signatures, and it could be fixed retroactively if those codes were convertible.  Here's a thread with three different versions of the subject.  Only mine appears to be what the OP intended.  Also the board title now shows up as "Les Misérables (2012)," but not consistently.  It seems to be randomly correct.  Here is the messed up thread, starting with my post:

http://www.imdb.com/title/tt1707386/board/nest/211806283?d=234807000#234807000

Have you tested WYSIWYG in the previews?  Without fixing that, you'll continue to have problems.  An old workaround I used with last year's similar encoding problems still works.  Open the preview in a new tab.  If it is corrupted, go back and post from the original text box.  It sometimes avoids the WYSIWYG corruption.
(Edited)
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
I've just rolled out a fix for this, but seeing as this thread spans over a year there have been different encodings written into the database -- so not every message is fixed up.  The corrupted movie titles should also now be fixed.

I agree that the ampersand issues you pointed out on this page shouldn't happen; I was referring to a different problem whereby the preview would convert user's input back and forth between "&" and "&amp;"

If you're still able to see corruption during the preview, or the final post isn't what the preview showed, then please let me know how to reproduce the problem and I'll take a look.  But right now it's all looking good.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
The thread I linked to above now just shows the corruption when the subject contains "�" and the board title corruption is gone.  If I see anything else that isn't working, I'll post it here.  Thanks, Murray.
Photo of DrakeStraw

DrakeStraw

  • 286 Posts
  • 132 Reply Likes
That didn't take long.  Now the link text on my profile page is corrupted where it wasn't before  The link appears as:

Re: Overrated Shíte
Photo of Murray Chapman

Murray Chapman, Employee

  • 109 Posts
  • 63 Reply Likes
Bug found and fixed; should be live tomorrow.  Thanks for your help tracking all these down.  We're slowly discovering and removing all the pseudo-fixes-of-fixes for encoding that have been applied over the years.  I suspect that there may be a few in the PM pipeline but I haven't been able to find any yet.  It's a complicated system and depending on what combinations of characters are on any given page, the browsers may automatically (or not) fix up issues.  Not calling "victory" just yet, but I feel like we're getting there.
Photo of Nobody

Nobody

  • 1455 Posts
  • 707 Reply Likes
The ampersand character may still appear as "&amp;" when viewing
board messages that were posted from the IMDb Android app?

I have only second-hand evidence of that.  (I don't have an Android device.)  After someone else posted a message in which the "&amp;" codes appear, I asked whether the poster used a mobile app.  A reply confirmed that the post was made from an Android phone.
(Edited)
Photo of Nobody

Nobody

  • 1455 Posts
  • 707 Reply Likes
The ampersand character may still appear as "&amp;" when viewing board messages that were posted from the IMDb Android app? ...
That is still an issue at this time, btw.   I often see "&amp;" in some recent messages on various boards. ... (I don't use the mobile apps, but I do know that the problem does not occur when I post from a desktop computer.  My ampersands post correctly as "&", not "&amp;",  so I assume that the problem only affects messages posted from the mobile apps.)
(Edited)

This conversation is no longer open for comments or replies.