''Advanced Title Search'' produces URL with incorrect encoding -- please fix to support UTF-8

  • 1
  • Problem
  • Updated 6 years ago
Merged

This conversation has been merged. Please reference the main conversation: Advanced Search Bug

Please try this:

Go to the Advanced Title Search page:

http://www.imdb.com/search/title

Paste this example query into the Title search box: "Quién sabe?"
(without the quotation marks). Then press the Enter key.

The resulting search URL produces no results, because
the "é" character is incorrectly encoded in the URL:

http://www.imdb.com/search/title?title=Qui%C3%83%C2%A9n%20sabe%3F

That is the incorrectly encoded URL that the aforementioned example query currently produces. In that URL, the "é" character is incorrectly encoded as %C3%83%C2%A9.

Tech note: When I submit the search query, my web browser correctly submits a "POST" request with the correct encoding in the query. But in response, the IMDb Advanced Title Search engine issues a type 302 redirect to the search-results URL where the incorrect encoding appears. (Apparently, UTF-8 support is not yet fully implemented in the search engine procedure that generates the redirect.)

Please fix the Advanced Title Search engine to produce correct encoding for all UTF-8 characters. In the aforementioned example, the correctly encoded URL would be as follows, with the "é" character encoded as %C3%A9:

http://www.imdb.com/search/title?title=Qui%C3%A9n%20sabe%3F

(The aforementioned query is just an example. A complete fix would, of course, implement correct encoding for all allowable UTF-8 characters for all languages that IMDb Advanced Title Search is intended to support.)

Thanks.
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes

Posted 6 years ago

  • 1
Photo of (closed account)

(closed account)

  • 379 Posts
  • 431 Reply Likes
Excuse me for "bumping" this. Could a staff person please acknowledge the issue? It is not very urgent, but acknowledgment will assure me that this is "on the list" for eventual future investigation.

This is simply another addition to a list of various different "character-encoding related" issues that have been reported. I understand some of the software code involved is many years old, and was written before UTF-8 extended character support was undertaken on the site. I also understand that it will take time for such various issues to be worked out, with the long-term goal of consistently reliable UTF-8 character-encoding compatibility throughout the site.

Thanks

This conversation is no longer open for comments or replies.