As of 2015, Facebook has overtaken Google as the main source of traffic to news sites[1]. The average Facebook user spends 40 minutes a day on the site[2]. If we fail to archive news feeds we will lose important sociological information.
The Internet Archive does a great job of archiving the web, among other things like VHS tapes. They’re located here in The Richmond and you can go visit them on Fridays during lunch[3]. I’m concerned that while they are doing a great job of getting the data, they’re missing really crucial metadata, i.e. feed presentation.
The Wayback Machine is necessary but not sufficient for a full representation of the state of the web as it is experienced by real people. If you really want to understand the zeitgeist of today, January 17, 2017, you wouldn’t want to only see the front page of the NYT. You’d want to experience the web mediated through individuals’ news feeds. I imagine a good number of researchers would like to sample feeds in the years leading up to the 2016 presidential election.
I see there is a chrome extension that can be used to archive a feed, but I haven’t seen anyone scraping feeds at scale over time. Imagine a sort of opt-in system where volunteers give archivists read-access to their social media accounts. With a few hundred opt-ins you could get a decent sample of all the different cultural/subcultural bubbles.
The obvious candidate to solve this problem is Facebook itself. They might already have a feed time machine internally! Contributing a few hundred feeds (and appropriate developer time) to the Internet Archive would be a great move and a PR win.
[1] http://www.adweek.com/socialtimes/facebook-is-now-the-top-referral-source-for-digital-publishers/625300
[2] http://time.com/3950525/facebook-news-feed-algorithm/
[3] If you live in San Francisco you should go do this. You will have fun and meet some interesting people.