Spurious entries in the feed for "In the Pipeline"

  • Problem
  • Updated 8 months ago
The feed at http://pipeline.corante.com/index.xml has been showing a lot of spurious entries of late. The entries have a title taken from a recent post, but no body and no url. The title is still a link, but the url is just a formatted date like "2014-09-08 16:37:06.910087". A quick look at the feed shows that it is truncated but looks correct in all other respects. Obviously the truncation is going to make it difficult to parse normally, so you must have a fallback parser of some kind, which is cool. Or perhaps your parser just doesn't rely on the xml being valid in the first place. Either way, this feed is confusing it.

Incidentally, the broken links actually crash the Newsblur Android app.
Photo of db48x

db48x

  • 3 Posts
  • 0 Reply Likes

Posted 8 months ago

  • 3
Photo of Samuel Clay

Samuel Clay, Official Rep

  • 5251 Posts
  • 1172 Reply Likes
It's because the publisher is changing the URLs in the feed, but because there is little to no content, there is nothing for NewsBlur to de-dupe. And while I used to de-dupe solely by title, there are more feeds where that's a poor strategy than not.
Photo of db48x

db48x

  • 3 Posts
  • 0 Reply Likes
But you're not parsing urls out of the feed here, you're accidentally picking up timestamps instead.