Official Google Blog duplicates and republishing?

  • 10
  • Problem
  • Updated 2 years ago
http://googleblog.blogspot.com/



The blog seems to triple postings, and I read most of the articles last night and this morning they are "new" again. It only does it on this blog.
Photo of Darren Kay

Darren Kay

  • 68 Posts
  • 19 Reply Likes

Posted 2 years ago

  • 10
Photo of Nick Potter

Nick Potter

  • 12 Posts
  • 2 Reply Likes
I've also been seeing this for several days now. Only happens with the Google blog.
Photo of Darren Kay

Darren Kay

  • 68 Posts
  • 19 Reply Likes
Still getting 3 copies of every story.
Photo of Matt8

Matt8

  • 27 Posts
  • 3 Reply Likes
Yeah - also having this problem.  Happened after Google "combined" their product blogs into this one.  The feedreader (http://feeds.feedburner.com/blogspot/MKuf) doesn't seem to show the stories in multiple.
Photo of Darren Kay

Darren Kay

  • 68 Posts
  • 19 Reply Likes
Now becoming very annoying. Seeing repeats of last weeks and this weeks posts. And a sprinkle of dupes.

Yet no response or fix
Photo of Darren Kay

Darren Kay

  • 68 Posts
  • 19 Reply Likes
Now p****** me off. Today I got ALL of this weeks blog entries again today
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
I responded to another thread that was similar to this one. Anyway, I'm not sure why the de-duper is not picking up on these stories being the same but having different IDs. Actually, I think it is working just fine, but there may be a merge problem. Because this is such a central blog, lots of other blogspot blogs redirect to this one, and when they get merged, their stories also get merged. And because of a quirk in how Blogspot handles permalinks, they look like different stories. 

When the stories get duped, how often does it happen? And how many stories come in duped at a time?
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Actually, looking in the code it seems that dupe stories from dupe feeds do get deleted, so I'm not sure what's going on here.
Photo of Nick Potter

Nick Potter

  • 12 Posts
  • 2 Reply Likes
I'm getting at least 3 copies of every story and like others occasionally I get a repeat of a whole weeks worth.
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Ok, I just deployed a change that will let me better watch this happen. Next time it does, please let me know on this thread. I can then take a look and see what actually happened on the backend. I would love to get to the bottom of this and fix it once and for all.
Photo of Nick Potter

Nick Potter

  • 12 Posts
  • 2 Reply Likes
Happened again - Just checked my feed today and had a load of posts from last week + multiple duplications of those too!
TBH it happens all the time so you could probably check anytime!
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
It hasn't happened since I posted, so I'll keep watching. I'm subscribed to it now and regularly checking it.
Photo of Bill B

Bill B

  • 22 Posts
  • 1 Reply Like
Looks like it has happened again. Two stories from yesterday unread this morning after reading them yesterday. When I change view to show all entries, Each post is in triplicate.
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
I'm watching it. It's going to take a couple weeks but I'm checking it constantly and will get to the bottom of it soon enough.
(Edited)
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Ok, I've got my new database raw feed data collector installed. That will help me figure out why this feed is bypassing the story de-dupe checker. NewsBlur can handle incorrect story guid changes, so I need the raw feeds to see what's wrong in between fetches.
Photo of Matt8

Matt8

  • 27 Posts
  • 3 Reply Likes
Just updating this - this weekend this Google blog has done it at least twice (showing articles previously marked as read)
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
So I've been tracking it and found the smoking gun. But I still don't have a solution yet. They have some weird caching issue that changes the feed URL from http://google.blog to http://www.google.blog, which are two different addresses that are not de-duped. That's how the stories are getting reinserted, but NewsBlur should be eliminating those. I have yet to write the test case and fix it up, but I'll try to do that this week.
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Ok, I think I finally cracked this one. I wrote up an extensive test case against the real data and managed to fix it once and for all. After this point, no story from Google or The Verge should duplicate itself and you should not see any new unreads after having read them.

It's a simple fix, so I'm hoping it'll take, but if it doesn't know that I now have test cases that mimic the old behavior and they are now passing, so I hope that will be the end of this nefarious bug.

For those interested, here's the test case: https://github.com/samuelclay/NewsBlur/blob/master/apps/rss_feeds/tests.py#L179-L225
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Just a note on this, I had to roll back one of the changes because it was hammering the system, which means it is now slightly possible that it duplicates stories. If that happens, I'll see it since I'm actively focused on this issue. But when it does happen, I'll have more data that I can then use in my test cases.
Photo of Chris Minett

Chris Minett

  • 11 Posts
  • 0 Reply Likes
I just found this thread because I'm getting the same problem. However I'm not subscribed to the URL at the top of this thread (http://googleblog.blogspot.com/ which NewsBlur says only has 96 subscribers), but the alternative posted by Matt8: http://feeds.feedburner.com/blogspot/MKuf which has almost 12k subscribers.

I've added http://googleblog.blogspot.com/ to my list of feeds, and can see far fewer duplicates in there, but there are still loads in http://feeds.feedburner.com/blogspot/MKuf

Perhaps it's worth making sure the fixes (not sure if still rolled back since your last update) work on that one too, as it looks like that's going to affect far more people.
Photo of Matt8

Matt8

  • 27 Posts
  • 3 Reply Likes
I am still subscribed to the same one, and the duplicates seem solved for me, which I definitely appreciate!
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 6514 Posts
  • 1474 Reply Likes
Chris, this is the feed that I am subscribed to, as well as the 12k other subscribers:

http://www.newsblur.com/site/766/the-official-google-blog