Duplicate stories on Massively.com

  • 1
  • Problem
  • Updated 8 years ago
Sometimes, a lot of duplicate stories appear on Massively.com's feed, http://massively.joystiq.com/rss.xml. I think this may be causing a bug where sometimes the content window ("Feed" view) appears blank after clicking through about 6-8 stories (using "Next Unread" button to advance). This occurs when viewing the feeds for the folder I have the feed in (River of News).
Photo of timb

timb

  • 1 Post
  • 0 Reply Likes
  • frustrated

Posted 8 years ago

  • 1
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
I'm seeing a bunch of duplicates from a few days ago. It looks like the address changed from www.massively.com to massively.joystiq.com. But it seems the dupes are no longer showing up. Can you confirm that it looks OK now?

There is code that is supposed to weed out duplicates, but the content similarity needs to be above a certain threshold (at this point, more than 99% of the text needs to be identical), and it looks like these stories were just off enough so as to not be caught. I think I might change it to a sliding scale:

< 1000 characters, > 99%
< 2000 characters, > 97%
< 3000 characters, > 95%

Something like that, to perhaps have a better shot at detecting dupes that swear they are not dupes (but clearly are).