Multiple problems with Ukrainian Cyrillic titles

  • 3
  • Question
  • Updated 2 months ago
  • Answered
A recent commentary in another thread offered me advice on Cyrillic alternate titles I was not sure with, so I researched and re-checked. As a result I am kind of shocked at what I have found.

Even if it is true that the only way to write some of the Ukrainian titles is to using Cyryllic symbols in conjunction with Latin ones, "ї" should have never been reflected with "ii", as such symbol is in use in the system due to being in French language. Instead, for reasons beyond my understanding we now have dozens of alternate titles (such as this one) using "ii" to reflect "ї", which is deeply erroneous and nowehere near indicative of how language actually looks. 

The case of "ї", of course, is only the tip of the iceberg as Ukrainian language also has unique letters "ґ" and "є" and while the former is not exactly of frequent use, the latter is one of the primary vowels, which is usually quite different from "e" (could be compared to "ye" as opposed to "a"). 
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4521 Reply Likes

Posted 9 months ago

  • 3
Photo of Michelle

Michelle, Official Rep

  • 12888 Posts
  • 9982 Reply Likes
Hi Nikolay -

Just to clarify the issue, there are transliterated Alternate titles listed on the site incorrectly with "ii", where they should be listed with a single "i", is that correct?

If this is the case, the applicable text corrections will need to be submitted for these Alternate titles for our editors to review.
(Edited)
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
Hello, Michelle. 

Technically symbol is actually "ï", identical in Ukrainian and French languages. Judging by this example symbol should be eligible.  
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi, Nikolay.

I can not be sure that all of them, but there are hundreds added by me with "ii". It is the only way to explain the difference between Ukrainian "і" and "ї". The same symbol which is used in French language is from another encoding table, that's why it is technically not allowed for the Cyrillic table. The only way for now to provide all the symbols for Ukrainian language is using KOI8-U instead of KOI8-R. And I remember that very long time ago this was discussed but refused to make inbuilt (for no serious reasons I think).

Nikolay, do you need from me the plenties of examples where the single "i" instead of "ii" is not only incorrect but hard to read?

PS: and yes, there some rare examples when even double "ii" is not able to solve the problem completely.

PPS: and yes, I pretend that you might be trying the French "ï" for the Cyrillic submission before starting this thread.

PPS: and yes, I'm sure you understand that the way you described this problem is the open gate to question the eligibility of Ukrainian in the alternate titles. Good for you.
(Edited)
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
It would be really great to involve the technical staff of IMDb to clear out the faster and correct ways to solve this problem. If this happens,
---
Please, to the attention of technical staff:
According to the rules of Ukrainian orthography, there can be no two "і" (cyrillic) letters following each other in Ukrainian language. That's why two latin "i" is the best substitution for cyrillic "ї" [yi], so it can not be misunderstood by the end users, as well as remains better for a possible automatic reverse substitution.

Some contributors used to substitute cyrillic "ї" [ yi ] with single latin "i" [ i ] in alternate Ukrainian titles. I consider their intention is good because this way they helped IMDb to achieve more attention from Ukrainian users, but unfortunately is not good for the correct reverse substitution.
___________________________________

Finally, I see no objective reasons for using the narrow KOI8-R table instead of the universal KOI8-RU that is intended for all cyrillic languages not only Ukrainian. This problem was questioned in the old boards years ago, those time (if I remember well) the only reason to prevent the KOI8-U or KOI8-RU from using was "it's not in time because we plan the very big general remake". "Not in time" keeps reasoning for a decade...
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Concerning the letter of rare use in Ukrainian alternate titles. 
I've checked about 7,000 titles and made rare substitutions where Ukrainian "є" is expected to be. Now the cyrillic "е" (used to be instead of "є") is substituted with latin "e" that is very good for automatic reverse substitution, but still make no visual difference for end users.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
BTW, a simple statistics shows that 58% of alternate titles could not be contributed for Ukrainian segment without the letters substitution.
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
MAthePA it's a bit hard for me to read your emotion through writing (which is odd, since we should have similar "written accent" henche being native Ukrainian speakers). Like that "good for you' - I can't tell whether it was meant as a bit of sarcastic jab. If so, then I can't see how not raising any discussion helps Ukrainian language legitimacy on IMDb. Ideally it would have been good to finally have some "ї", "є", "ґ" and "і" and I'm not sure as to why they cannot be added, but I won't pretened that I know all of the internal workings. Cyrillic alternate titles only appeared in late 2000's if I remember correctly and for several years I loathed them, not being a fan of seeing translated titles. 
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Even if it is true that the only way to write some of the Ukrainian titles is to using Cyryllic symbols in conjunction with Latin ones, "ї" should have never been reflected with "ii", as such symbol is in use in the system due to being in French language
the above strings of yours is what is really odd in all this story, because this current topic is started a couple hours after you've had the answer about impossibility to use "such symbol from French" and about technical side of this problem:
https://getsatisfaction.com/imdb/topics/mykhailo-hrushevsky-unicode-characters-in-places-of-birth?to...

and the technical side of this problem is correctly addressed a few years ago, but not actually solved:
https://getsatisfaction.com/imdb/topics/add-please-cyrillic-koi8-u-title-or-cyrillic-koi8-ru-title-to-alternate-titles-im-often-injected-alternate-titles-ukrainian
where you could explain why 'using "ii" to reflect "ї"' is more "erroneous" than the similar substitutions in general that are in use from the days when IMDb allowed Cyrillic in the restricted form.

If you prefer to see an emotion in my words, it's only because of the three words "good for you" (less than 0,5% of written here). All the rest is factual and aimed to solve the technical problem.
Photo of Ed Jones(XLIX)

Ed Jones(XLIX)

  • 20073 Posts
  • 22737 Reply Likes
Nikolay, MAthePA got angry at me for no reason, simply because he has no concept of what's humorous in English. So your sense of humor and his are different. He is serious all the time. So I doubt that he has a grasp of sarcasm that he can convey in English.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
This reply was created from a merged topic originally titled Admins, please stop the provocative spamming slanderer.

Ed Jones (XLIX) has warned me that he will never reply to me directly, and I will be "on my own" since the moment. It was great for me to know that each of us keeps understanding there is a line between that should not be crossed...

Unfortunately, Ed Jones (XLIX) is trying to find other ways to make my participating here uncomfortable, by engaging third persons to discuss me or provoking such discussion: https://getsatisfaction.com/imdb/topics/multiple-problems-with-ukrainian-cyrillic-titles?topic-reply...
and placing such (a spam) in a thread where an important problem is under discussion.

In view of the above and the content of his message, admins please delete that message of Ed Jones (XLIX) in that thread, or move it here if he insists to keep discussing me. If proceeding to discuss me, he might explain on each word of his sweeping condemnations or apologize. Otherwise, if his message just deleted, I hope Ed Jones (XLIX) will further keep the good mood of constructive neutrality to my person, as I do to him.
Photo of Ed Jones(XLIX)

Ed Jones(XLIX)

  • 20073 Posts
  • 22737 Reply Likes
That was for polls only
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
No, Ed, not only polls. I'm not angry (as you think), no sorrow, no else...
It's really better for both to keep the distance, I agree with you in this. So please, just delete that message (because it's more provocative than you may think) and then I need nothing (no possible apologizes) - this will be solved. Or explain on each word. I really don't think it's a good place for this, so you may try to create a separate thread for this because the created by me is merged here.
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
Ed Jones (XLIX)MAthePA, I really wonder how all of that got merged with this thread, of all places. 

Anyhow, my five cents on the situation: communication is hard and you both can be a bit too eager on the explaining side. I'm not the one to judge (since I perfectly capable of blunders in conversations myself), but Ed has a habit of repeating and MAthePA to argue. Both are very useful, but if combined in a wrong place and time lead to... Well, things like that.   

MAthePA, I can say that I really can't remember any recent altercations beteween you and Ed. And since I routinely go through all of the messages here that means I somehow haven't noticed what exactly happened. 
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Nikolay, you are really a strange "peacemaker":
The only claim of mine is the message of Ed that you "liked" a few minutes ago. So now he has no feasible opportunity to delete that, even if he could wish to do that. (But he had enough time, BTW).

My position on what happened earlier several times, and not only with me, when communicating with Ed -- this is weighted and reasoned. I'd like to prevent all the possible situations with him in future, that's why earlier agreed to keep distance. It's good for both of us, but one of us trying to threw rocks over the wall.
Nikolay, what is your final intention in previous message? And why 12 (twelve) hours were in need for you, but not for him to make a little step back?

PS: 40 minutes more no answer... See you tomorrow, guys.
(Edited)
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
MAthePA I think I fell asleep exactly at the time you wrote that, sorry. 

I am no peacemaker: no matter how much I want I can't prevent conflicts from happening I'm so idiosyncratic that my own intentions are often more tangled then those of other people. I can say that: I am probably a bit more accustomed to Ed's sense of humor so I never even noticed anything that could be of offense, while you're a bit overthinking everything. And I say that as a chronic  overthinker who can has a panic attack for half-an-hour after sending an e-mail, being unsure as to how people would respond. 

As for 12 hours... On GS I usually spare an hour or two (or three if it got bad and unsorted for a week) to go through every single notification. Depending on how I feel, what else I do and many other circumstances it can be any hour of the day. I do, sometimes, answer one or two apart from that, but a habit is a habit and that's mostly how I roll here. 
Photo of Will

Will, Official Rep

  • 4018 Posts
  • 5210 Reply Likes
Hi everyone,

Please bear with me as I'm sure you can tell that I am no language expert. Firstly we welcome opinions, but please can we try to remember the contributors' charter when discussing points with other contributors - particularly this comment:

In return we hope that when there is a difference of opinion, contributors also show courtesy and respect during their conversations with each other and with IMDb staff.

Am I right in thinking that the issue here is that we only support KO18-R cyrillic charcters and not KO18-U cyrillic characters which means that certain letters in Ukrainian cannot be accurately reflected on the site? In those cases are you using latin character replacements such as i in combination with the Cyrillic letters with an attribute of KO18-R title? Are you able to submit these normally through our submission interface? The screenshot on https://getsatisfaction.com/imdb/topics/mykhailo-hrushevsky-unicode-characters-in-places-of-birth suggests that the error is backfiring due to the combination of both character sets but I'm intrigued how you are getting around this at present.

Thanks,
Will

 
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
Hello, Will

Answering the questions: 
-Seems to be so. KO18-U characters are nowhere to be seen and are not supported. 
-I've never typed the Cyrillic titles myself, but it certainly seems that this is the case, with people using "i" from English typesets to reflect Ukrainian "і". People also replaced "ї" with "ii", which is unfortunate, as it is not reflective of hwo it is actually written. 
-I was never able to submit Cyrillic myself but it seems to be a unique glitch in my case. Scrrenshot mentioned above pretty much reflects how it always was in my case, with both Russian and Ukrainian. 
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi, Will.

The issue here is that IMDb only supports KOI8-R table of characters when submitting alternate titles in cyrillic. As a result, the Ukrainian language (as well as some others) is effected and lacks for the letters: "і", "ї", "є", "ґ", "І", "Ї", "Є", "Ґ".

To solve the problem for Ukrainian and Bulgarian languages, we need KOI8-U allowed. Plus Belarusian is welcome if KOI8-RU.

The technical side of this problem is correctly addressed a few years ago (as well as on the old boards), but not actually solved:
https://getsatisfaction.com/imdb/topics/add-please-cyrillic-koi8-u-title-or-cyrillic-koi8-ru-title-to-alternate-titles-im-often-injected-alternate-titles-ukrainian

Up to this moment, the problem has the temporary solution -- by means of latin symbols substitution (and their caps):
  • latin "i" for ukr. "і" ;
  • latin "ii" for ukr. "ї" ;
  • latin "e" for ukr. "є" ;
  • no substitution for "ґ" ;
  • single latin "i" for ukrainian "ї" [yi] was in use before, and may possibly be found in rare cases now.
About 58% of alternate titles could not be contributed for Ukrainian segment without the letters substitution.

According to the rules of Ukrainian orthography, there can be no two "і" (cyrillic) letters following each other in Ukrainian language. That's why two latin "i" is the best substitution for cyrillic "ї" [yi], so it can not be misunderstood by the end users, as well as remains better for a possible automatic reverse substitution.
The latin "e" used for substitution is very good for automatic reverse substitution, but still make no visual difference for end users.

Will, I'd like to emphasis that the substituted variants are mostly not problematic by their content -- the cyrillic titles correspond to their actual variants, but some of them (with "ii" and "e" substitutions) are visually not correct for such letters. Plus, there are some rare cases when the"ii" substitutions could not solve this problem initially.

It will be the decision of staff for sure. But if a practical advice is acceptable, I'd suggest (after a new table implemented) to make the reverse substitution of the latin symbols and then check the result for spelling before publishing. Or I could help checking in complicated cases, or the published ones.
Photo of Will

Will, Official Rep

  • 4018 Posts
  • 5210 Reply Likes
Ok, so at the moment you can use the latin-1 character "i" in combination with the cyrillic characters if submitted with the attribute KO18-R title?
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Yes. It's in the main part of cyrillic tables.

Here is an example of the latin in use:


Here is the result when the extra part of table is added:


The current KOI8 table is missing these symbols:


and any of them can not be substituted by the Latin-1 which is good for e.g. French. Except the "i" couple.
(Edited)
Photo of Will

Will, Official Rep

  • 4018 Posts
  • 5210 Reply Likes
OK so I understand that Ukrainian Cyrillic KO18-U characters are currently not supported, therefore if entered as the Cyrillic KO18-R will fail in the submission system, which is working as planned as those characters are not supported.

In the top example, you have used latin characters but submitted under the Cyrillic KO18-R title attribute which is incorrect as the characters aren't cyrillic unless I'm mistaken, or was that just for this example? Those should be submitted as a transliterated title instead, if they are latin characters as the cyrillic characters cannot be supported.

Regards,
Will  
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Will, if you follow the above-provided link, you'll understand that the main set of latin characters is the constituent part of KOI8:


so those characters are supported, but not all of them if IMDb really cares to know this.

I understand that editors may have not enough knowledge to see the best possible way to solve this technical problem, keeping the content. That's why the way how this problem is addressed to the staff is destructive initially, and this is as such initiated by a person who actually does not care if the localized titles in Ukrainian language appear on IMDb at all because "not being a fan of seeing translated titles". OK, but what should other people do who pays money for the tickets to visit the Ukrainian theaters where those titles are only in Ukrainian for many years and can not be fast and clear in understanding otherwise?.. And this is the reason why I asked before here to involve the technical staff of IMDb, and explained the issue in details. But again, what editors care the most, the substituted variants are correct by their content -- I'll be glad to search for examples if you need. Correct content with not perfect visual way of reflecting some letters.
(Edited)
Photo of Nikolay Yeriomin (Mykola Yeromin)

Nikolay Yeriomin (Mykola Yeromin), Champion

  • 3340 Posts
  • 4519 Reply Likes
MAthePA, well, you just twisted my words and insulted me. Again. Great job.  

The fact that I'm not a fan of translated Cyrillic titles on IMDb (or rather was not a fan back when they appeared which was ten or so year ago) does not automatically says that I don't care about Ukrainian language representation on this website. On the contrary, i care about the issue very much. That is why I raised the topic and that is why I don't like that "ї" appears as pointless "ii" on the site. I wanted to address the issue and, as a result, you came here distorting what I do and trying to say that I can't address the topic,  because in your humble opinion I'm not Ukrainian enough.    

Honestly, should I change my GS account to my Ukrainian name in order for you not to see red any time I appear to discuss Ukrainian? You basically just hate to see me adressing anything regarding Ukraine. Well, surprise, I'm going to do that. I'm going to do that every time I'll encounter something like that because I do care about my language and my country and the fact that you think otherwise does not make my concerns about Ukrainian language representation on IMDb any less legit. 
Photo of Ed Jones(XLIX)

Ed Jones(XLIX)

  • 20073 Posts
  • 22737 Reply Likes
Николай Ерёмин

Now you see what I mean!!!

Photo of Jeorj Euler

Jeorj Euler

  • 7377 Posts
  • 9716 Reply Likes
However, let it be clear, this is IMDb's fault.
Photo of Ed Jones(XLIX)

Ed Jones(XLIX)

  • 20073 Posts
  • 22737 Reply Likes
IMDb needs to be Accurate as they claim with Languages too!
Photo of Michelle

Michelle, Official Rep

  • 12888 Posts
  • 9982 Reply Likes
Hi all -

Just chiming in here, know that we are aware of the character set limits that we currently have and we're hoping to address this in the future. 
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Michelle, thank you for clarifying this to all. 
Because it seemed as news and a ground for thousands of deletions. 

(I hope the unicode implementation and the related policy of IMDb has not changed since the time when it was first announced).
Photo of Owen Rees

Owen Rees

  • 253 Posts
  • 397 Reply Likes
The announcement today Deprecation of encoding attributes for alternate titles does not explicitly address this point but it looks as if the problem that started this thread is addressed by this change. I will not submit it myself as I do not have personal knowledge of the change but it looks as if the change (replace the  "ii" with "ї") could now be submitted:




Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi, Owen Rees.

Thank you very much for caring. 
Unfortunately, the IMDb staff is not ready to meet the modernized platform. Starting from December 3 up to this moment, I've successfully corrected more than 1200 alternate titles in Ukrainian language, and for the first 800-900 it was OK processed. But then some editor started a massacre declining the changes automatically for different artificial reasons, as in this case:


When I started pointing in each explanation box that those are not "duplicate", then they were also declined because of "unable to verify" (very clever word for any possible). And what is most interesting, then some of my submissions (not all of them) became valid on site but not submitted as mine - this is clear thanks to wrong attributes set later by a data-editor.

So someone just made it the hell for me to process those corrections further. 300-400 hundred were submitted after a second and sometimes third resubmission after previous declines. I reported this problem 4 (four !) times to the "Contact us", and all of them replied by one and the same person Elizabeth replying each time that I'm OK to proceed as I did before and no problem will be, but the problem is same remaining. They just making me "give up!"
(Edited)
Photo of Owen Rees

Owen Rees

  • 253 Posts
  • 397 Reply Likes
The entry has changed since that submission, the attribute is now (imdb display title).

Today's announcement suggests that there have been things going on behind the scenes recently in this area so submissions based on things not yet officially announced (as far as I have seen) may be running into things that are not yet complete.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
I believe that "not yet officially announced" could be an excuse when such a feature is not working yet so it's a kind of alpha/beta, and may cause problems in most cases. But I've tested this feature from the first days and from the very start proved there is no problems. There are only 4 Ukrainian letters (symbols) and their caps that are new to the system, and I proved they work flawless on three titles containing them all, and submitted hunreds of them later with no problems. The problem is in a sabotaging part on the editors' end.