Is there an easier way to organize my lists?

  • 3
  • Question
  • Updated 7 years ago
  • Answered
Archived and Closed

This conversation is no longer open for comments or replies and is no longer visible to community members. The community moderator provided the following reason for archiving: Old thread

It is 12 minutes since yesterday when I started sorting my lists. I am not done, and I have been working at this for over 4 hours. With the articles "A" and "The" not being ignored in a standard sort, I am having to keep a piece of paper in front of me so I can get this list properly alphabetized. Please let me take my lists off line, sort them with Perl, and then upload the properly sorted list.

My watchlist is 6 pages long in Compact view.
My "Own on DVD" is 2 pages long in Compact view.

Sorting manually ignoring articles is a right pain in the rear end; sorting by date for franchises is also giving my poor wrists a lot of strain. I feel sorry for any user with arthritis.
Photo of Lady Aleena

Lady Aleena

  • 21 Posts
  • 14 Reply Likes
  • frustrated

Posted 7 years ago

  • 3
Photo of Dan Dassow

Dan Dassow, Champion

  • 16299 Posts
  • 18164 Reply Likes
Unfortunately, IMDb currently does not sort titles in list properly. The article "A", "An" and "The" are included as part of the sort. Although a user can export a list, IMDb does not allow a user to import a list. There is already an Idea to import lists. For details, see: Importing other sites' history to IMDb [https://getsatisfaction.com/imdb/topi...]. If you support this idea, I recommend +1 it.

Likewise, your question would be the basis of a good idea. When I get a chance, I will reformulate it as an idea.
Photo of Lady Aleena

Lady Aleena

  • 21 Posts
  • 15 Reply Likes
Did you know IMDb used to sort titles properly, but they dropped it for some reason I can not understand? Well, I'm going back in to try to wrest some sense to my lists.
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3263 Posts
  • 2925 Reply Likes
They dropped it, at least in part, due to the problems with articles in other languages. For example: the French article L' not only needs to be prepended with no space, but the following letter needs to be lower case, so the original title of One Deadly Summer in storage format would be été meurtrier, L' which looks bad.

Even more problematic is the German article Die. If you had a title like Die, Scumbag, Die how would the system know that this should not be treated like Jungfrauen von Bumshausen, Die?

IMDb really needs to set up a system like the MARC 21 system used by libraries. This stores the title in the display format, but adds a number to indicate the number of characters that must be ignored for the sort; i.e. The Accidental Tourist would have a ignore number of 4, so the sort would be on Accidental Tourist
Photo of Dan Dassow

Dan Dassow, Champion

  • 16248 Posts
  • 18068 Reply Likes
Sigh! Yes, at one time IMDb sorted titles and peoples names properly.
Photo of Lady Aleena

Lady Aleena

  • 21 Posts
  • 15 Reply Likes
If they stored the language the titles were in, they could apply language specific rules about articles. I could probably write a sort in Perl based on language rules in a few moments, if I had all of the language rules in front of me.
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3017 Reply Likes
To be honest, you could just crowdsource it, with a sort field/attribute that people could update. Also, as mentioned above, they did have all the data for correctly sorting titles, so could probably run a script that'd reintroduce those and rely on crowdsourcing to pick up any exceptions or new entries not covered. Granted there'd always be a few entries that'd mess up the neatness but people will always be spotting and fixing these too (I personally could live with a little messiness sneaking in for better sorting).
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3263 Posts
  • 2925 Reply Likes
Setting up fully automated rules might not be quite that simple. When researching the problem (both to comment on IMDb and for my own database, I found a table of language/article combinations which listed 44 languages and 138 articles from a to yr (including ones like 'r, ang mga and na h-) - and this does not include the various spellings in romanizations of the Arabic al- - which provided a total of 225 valid article/language combinations.

I said then, and still feel now, that the way to go is to create an extra field as is done in the MARC 21 system (although that system actually uses (half of) the first byte of the title field as the flag). This field could be automatically populated for English, French and German articles, although a flag would need to be raised for Les and Die so the Data Manager can decide if these are actually articles. The other languages are probably infrequent enough that a weekly scan of all new/modified titles for initial strings that match the list of potential articles for a Data Manager to would be enough to get the rest correctly sorted.

This system has an additional benefit - you are not restricted to articles. Various symbols and punctuation marks can be included in the ignore, allowing shows like 'Allo 'Allo! to be sorted with the A's and also bringing ...And God Created Woman (1956) and And God Created Woman (1988) together in the sorted list, rather than having the first showing at the very top of the list.

Certainly the current sort is not really acceptable, and a significant change is necessary. It seems to me that if they had put even a small fraction of the effort expended in fancying up the display into solving this, we would have had a solution long ago.
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3017 Reply Likes
Yes, I suspect the system was abandoned because there was no way to automate it (the Die Hard franchise probably pushed them over the edge), but I don't see there being any reason this couldn't be crowdsourced. IMDB have their old data for the sorting that could be added back in and then they could allow people to update problematic films. I'd imagine it'd be easy enough to also have some kind of check that could be run during the submission process forcing someone to look over the sort attribute for the name (perhaps with a small degree of automation to address the most obvious cases with a few language specific rule) and tick a checkbox to continue, which should make sure the majority of new submissions conform to the right format (and data processors would be able to check to pick up any mistakes) and anything that falls through the net can be crowdsourced and sorted out.

There are other complications, as well as the obvious one - you mention Die Monster Die, for example, but if you ran language specific rules to generate a specific search attribute for each title then the West German title would be fine "Das Grauen auf Schloss Witley" but another is also listed: "Die Monster". My German isn't good, so I couldn't say if they have just used the English title, or if they've used the German article. Probably easy enough to resolve (I assume it is the former). It seems this issue might have been largely addressed in German translations to avoid confusion anyway - Die Hard is Stirb Langsam. So it might not be a big issue.
Photo of Dan Dassow

Dan Dassow, Champion

  • 16248 Posts
  • 18068 Reply Likes
People who speak German as their native language do on occasion borrow words from English and other languages. It is entirely possible, but probably unlikely, that Die Monster means The Monster.

Regardless, the MARC 21 system used by libraries looks like it could be viable for IMDb.
Photo of Lady Aleena

Lady Aleena

  • 21 Posts
  • 15 Reply Likes
DavidAH_Ca brought up something I had not thought of, movies with punctuation as the first character in the title. When I sort those titles, they always go on top, such as *batteries not included and 'Til There Was You. Now I'm thinking I need to rewrite my own sorting function to remove the initial punctuation which would leave Æon Flux as the only film with a special character at the beginning of the title. (I might have to add a new field to my own database to add titles' languages which would be a minor pain in the neck for me, but for a company like IMDb, it should be easy.)

I just wish I could sort my Watch and other lists easier. I have pages upon pages of unsorted titles there because I have a problem getting a title from one page to another. Also, the last time I was sorting, IMDb got really buggy on me. I had titles with the same sort number and other sort numbers were really out of whack. I had to quit for the day since refreshing didn't do me any good. I even closed my browser and reopened it, and the problem was still there. And yes, I have five films which are supposedly at the 55 position. :S
Photo of Dan Dassow

Dan Dassow, Champion

  • 16248 Posts
  • 18068 Reply Likes
Lady Aleena,

If I recall correctly, Emperor reported a similar problem with some of his lists having elements with the same position number.

DavidAH_Ca, mentioned in the thread How can I get the IMDB data dump with unique IDs, or in another format like XML? [https://getsatisfaction.com/imdb/topi... ]
For years, the Name (for persons) and the Title (for Films) were the unique keys. I believe that much of the system still uses these.

That is why every primary Name and every primary Title must be unique, which is the reason for the Roman numerals to distinguish people with the same Name (or Titles with the same name and year).
If you wish to test out your code on the complete set of IMDb titles, you may wish to look at: http://www.imdb.com/interfaces
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3018 Reply Likes
Previously requested here (and shot down with an explanation from GC) - worth bumping and/or +1ing as it'd demonstrate the desire for it:

https://getsatisfaction.com/imdb/topi...

This conversation is no longer open for comments or replies.