Hi IMDb team

I have recently posted this question/suggestion in a closed topic, which probably explains the lack of reaction / response. This topic was: https://getsatisfaction.com/imdb/topics/fake-reviews-and-ratings. It dealt with an extraordinary accumulation of fake ratings and fake reviews for a specific movie: The last boy (tt4157728)

This example, i.e. The last boy, is sadly representative of a growing trend on IMDb: an irritating pollution from fake users via a parasitic activity with an obvious lack of relevance and/or honesty: fake rating, fake reviewing, ... Without wanting to appear disrespectful towards the director of this film, it probably do not deserve this avalanche of 9 and 10. In my humble opinion, of course.

Before being critical, let me be clear on one point: I truly love the IMDb website. As well as the iPhone app. Despite its imperfections, it's clearly a worldwide reference. In order to improve its credibility, I think the IMDb team should harden the concept of 'regular' user, one of the main objectives being to reduce or even evacuate such activity. The IMDb audience should be reduced to moviegoers, and moviegoers only, without forgetting the padawans who wish to become one of them. Then, my email is only motivated by this parasitic activity as annoying as growing. Nothing more. Nothing less.

So...

Five month ago, the rating distribution of The last boy was like this:

Rating Raw votes

10 2054

9 1797

8 72

7 112

6 152

5 156

4 130

3 122

2 84

1 145

Thus, the >raw< average was exactly: 8,5 of 10

if we postulate that the fake ratings are concentrated on 1, 9 and 10 and that we roughly correct the 3 extreme values (1, 9 and 10 only), we obtain this new distribution:

Rating Votes with a rough correction

10 15

9 35

8 72

7 112

6 152

5 156

4 130

3 122

2 84

1 30

The >corrected< average was approximately: 5,1 of 10

The >weighted< average computed by IMDb was yet: 7,2 of 10.

Better than 8,5, obviously.

But insufficiently weighted, undoubtedly.

My first point is: because of this parasitic activity, the credibility of IMDb is somewhat at stake.

My first question is then: what >new< measures do you intend to put in place in order to reduce this issue?

My second and last point is: since Karlito's mail, no improvement is measurable for this movie, during five months, while you are fully aware of the issue, even if I realize that 1) this kind of parasitic activity is not solved film by film, but in a global way and 2) this is not an easy issue.

My second and last question is then: Why do you seem to ignore the issue? Your answer to Karlito looks indeed minimalist. Very minimalist...

If I may, I would like to propose a draft. Each user should be rated according to a credibility index, between 0 (absolute mistrust) and 1 (optimal trust). The weight used for the weighted average and the helpfulness used for the reviews should be both deduced from this CI (i.e. Credibility Index). Thus, if you are a weakly-credible user, your ratings, reviews and tutti quanti should then slightly taken into account or even literally ignored in the case of a null value.

In order to assess the CI of a specific user, the IMDb team may use different key indicators such as:

1) an optional communication of an electronic copy of the identity card, like I did 6 years ago when registering with AirBnB. If not, the CI should be significantly lower than 0.5, no matter what. As long as the user is not irrevocably identified, (s)he remains a guest with lower privileges and then a weak impact on IMDb.

2) a mathematical analysis of his/her set of ratings

2.1) the set of ratings must be quantitative

2.2) the distribution curve of the ratings must follow these basic rules:

2.2.1) It should be a Gaussian-like curve.

2.2.2) If the user sees everything or if (s)he chooses the movies randomly, the Gaussian curve should be centered on 5.5. Since this is probably not the case, it will be centered on a higher value like 6 or even 6.5. But definitely not 8 or 9! Thus, the CI should be indexed on this value: abs(average -5.5).

2.2.3) The ratings should not be from 6 to 9 for instance. They must be really from 1 up to 10.

Two extreme examples

*) If the distribution curve looks like a Gaussian centered on 5.5, spread between 1 and 10 and based on a quantitative set, the CI will be then optimal. For instance, the distribution curve of Col Needham (ur1000000) is close to perfection with an ultra-quantitative set. His CI should be 0.999 or even 1.

*) On the contrary, if the curve looks like a small Dirac delta function, the CI will then decrease significantly. For instance, the distribution curve of Ariana Catarina (ur57470894) is weird even if her ratings set is quantitative enough. Her CI should be less than 0.1

I hope it will help

Best regards,

Stéphane

I have recently posted this question/suggestion in a closed topic, which probably explains the lack of reaction / response. This topic was: https://getsatisfaction.com/imdb/topics/fake-reviews-and-ratings. It dealt with an extraordinary accumulation of fake ratings and fake reviews for a specific movie: The last boy (tt4157728)

This example, i.e. The last boy, is sadly representative of a growing trend on IMDb: an irritating pollution from fake users via a parasitic activity with an obvious lack of relevance and/or honesty: fake rating, fake reviewing, ... Without wanting to appear disrespectful towards the director of this film, it probably do not deserve this avalanche of 9 and 10. In my humble opinion, of course.

Before being critical, let me be clear on one point: I truly love the IMDb website. As well as the iPhone app. Despite its imperfections, it's clearly a worldwide reference. In order to improve its credibility, I think the IMDb team should harden the concept of 'regular' user, one of the main objectives being to reduce or even evacuate such activity. The IMDb audience should be reduced to moviegoers, and moviegoers only, without forgetting the padawans who wish to become one of them. Then, my email is only motivated by this parasitic activity as annoying as growing. Nothing more. Nothing less.

So...

Five month ago, the rating distribution of The last boy was like this:

Rating Raw votes

10 2054

9 1797

8 72

7 112

6 152

5 156

4 130

3 122

2 84

1 145

Thus, the >raw< average was exactly: 8,5 of 10

if we postulate that the fake ratings are concentrated on 1, 9 and 10 and that we roughly correct the 3 extreme values (1, 9 and 10 only), we obtain this new distribution:

Rating Votes with a rough correction

10 15

9 35

8 72

7 112

6 152

5 156

4 130

3 122

2 84

1 30

The >corrected< average was approximately: 5,1 of 10

The >weighted< average computed by IMDb was yet: 7,2 of 10.

Better than 8,5, obviously.

But insufficiently weighted, undoubtedly.

My first point is: because of this parasitic activity, the credibility of IMDb is somewhat at stake.

My first question is then: what >new< measures do you intend to put in place in order to reduce this issue?

My second and last point is: since Karlito's mail, no improvement is measurable for this movie, during five months, while you are fully aware of the issue, even if I realize that 1) this kind of parasitic activity is not solved film by film, but in a global way and 2) this is not an easy issue.

My second and last question is then: Why do you seem to ignore the issue? Your answer to Karlito looks indeed minimalist. Very minimalist...

If I may, I would like to propose a draft. Each user should be rated according to a credibility index, between 0 (absolute mistrust) and 1 (optimal trust). The weight used for the weighted average and the helpfulness used for the reviews should be both deduced from this CI (i.e. Credibility Index). Thus, if you are a weakly-credible user, your ratings, reviews and tutti quanti should then slightly taken into account or even literally ignored in the case of a null value.

In order to assess the CI of a specific user, the IMDb team may use different key indicators such as:

1) an optional communication of an electronic copy of the identity card, like I did 6 years ago when registering with AirBnB. If not, the CI should be significantly lower than 0.5, no matter what. As long as the user is not irrevocably identified, (s)he remains a guest with lower privileges and then a weak impact on IMDb.

2) a mathematical analysis of his/her set of ratings

2.1) the set of ratings must be quantitative

2.2) the distribution curve of the ratings must follow these basic rules:

2.2.1) It should be a Gaussian-like curve.

2.2.2) If the user sees everything or if (s)he chooses the movies randomly, the Gaussian curve should be centered on 5.5. Since this is probably not the case, it will be centered on a higher value like 6 or even 6.5. But definitely not 8 or 9! Thus, the CI should be indexed on this value: abs(average -5.5).

2.2.3) The ratings should not be from 6 to 9 for instance. They must be really from 1 up to 10.

Two extreme examples

*) If the distribution curve looks like a Gaussian centered on 5.5, spread between 1 and 10 and based on a quantitative set, the CI will be then optimal. For instance, the distribution curve of Col Needham (ur1000000) is close to perfection with an ultra-quantitative set. His CI should be 0.999 or even 1.

*) On the contrary, if the curve looks like a small Dirac delta function, the CI will then decrease significantly. For instance, the distribution curve of Ariana Catarina (ur57470894) is weird even if her ratings set is quantitative enough. Her CI should be less than 0.1

I hope it will help

Best regards,

Stéphane