Click here to show or hide the menubar.
Thread started by Dave Winer on Sunday, October 28, 2012.

NYT River repair work

A few weeks ago the NY Times firehose feed broke.

I emailed with a friend at the Times, and we were able to get it working again. But the new version of the firehose is a mere trickle compared to the former raging torrent.

This put me in a bad place because I depend on a gush of NYT headlines in my river. I could subscribe to all the feeds I could find, but that means that I'd get duplicate stories because the Times, like other pubs, runs many stories in multiple feeds.

I've always been thinking about doing a heuristic to fix this. I'd keep track of the titles that had already appeared in a river and skip duplicates. Last night during the Giants game I gave it a shot, and it worked.

I wrote the change up in this worknote.

I added a huge number of feeds to the NYT river. And it's starting to feel good again. I wanted to share this as a possible best-practice for other aggregator developers.

Updates

After running for a few hours -- success. The NYT river is back to its rich flow, at a time when there's lots going on -- the presidential election and a hurricane. And there aren't any duplicates. All is good. :-)

It's been a while since I really looked at the NYT river. They write such good descriptions. You have a pretty good idea what the article is about even without clicking. Much more useful than getting full text. Because I get a breadth of the news, and the experience is created by editors who know what they're doing.

Pointers

Pointers: NYT river.

NYT feeds page.

The OPML reading list for the NYT river.

XML