Reddit locks out Wayback machine to stop AI from scraping old posts

Reddit has restricted the Internet Archive’s Wayback Machine from extensively capturing its content due to concerns over unauthorized AI data scraping. The platform will now allow only the homepage to be archived, aiming to protect user privacy an...

Reddit locks out Wayback machine to stop AI from scraping old posts
Reddit has announced that it will restrict the Internet Archive’s Wayback Machine to archiving only its homepage, blocking the tool from saving most of its site’s content. This change comes as a direct response to increasing concerns about AI companies scraping Reddit data through the Wayback Machine, possibly risking Reddit’s content policies and violating user privacy.

Why Reddit Is Restricting Access

According to Reddit spokesperson Tim Rathschmidt, the company has seen cases where artificial intelligence firms accessed Reddit’s content via the Wayback Machine without adhering to Reddit’s terms of service. This includes scraping of posts, comments, and even deleted or removed content. Such unauthorized activities challenge Reddit’s ability to manage and protect its content.

Rathschmidt emphasized that until the Internet Archive can guarantee compliance with Reddit’s policies, this restriction will stay in place to safeguard users’ privacy and preserve the integrity of removed content.


Impact on the Wayback Machine’s Archiving

The Wayback Machine is a widely used tool operated by the Internet Archive, designed to preserve snapshots of websites over time. This archival service enables users to view historical versions of web pages, which is useful for research, fact-checking, and maintaining internet history.

With Reddit’s new limitation, the Wayback Machine will no longer archive specific Reddit pages like posts or user profiles, only the homepage. This significantly reduces the breadth and depth of Reddit’s content saved by the archive, restricting public access to old discussions and deleted data through this service.

Reddit’s Data Control Measures

This restriction is part of Reddit’s broader effort to control how its data is accessed and used, especially by AI companies. Recently Reddit has taken many steps to protect its content, including modifying its application programming interfaces (APIs) to limit data scraping, negotiating paid data licenses with firms like Google and OpenAI, and pursuing legal action against the companies such as Anthropic for unauthorized data collection.
ADVERTISEMENT

Reddit’s goal is to balance user privacy, platform safety, and its business interests by carefully regulating third parties, who can access its vast content.

Current and Future Outlook

Mark Graham, director of the Wayback Machine, confirmed ongoing discussions with Reddit about this issue but no formal announcement has been made. The Internet Archive community and users who rely on its archiving service await further updates to understand the long-term implications for internet preservation.

This move by Reddit highlights the complex challenge of protecting user privacy while preserving internet content at the same time, especially as AI technologies rely on large datasets gathered from the web.

FAQs:

Q1. What is Reddit?
A1. Reddit is an online community where users share posts, comments, and discussions on various topics.
ADVERTISEMENT

Q2. What is the Wayback Machine?
A2. The Wayback Machine is a tool that archives and lets people view past versions of websites.
Download
The Economic Times Business News App
for the Latest News in Business, Sensex, Stock Market Updates & More.
Download
The Economic Times News App
for Quarterly Results, Latest News in ITR, Business, Share Market, Live Sensex News & More.
READ MORE
ADVERTISEMENT

READ MORE:

LOGIN & CLAIM

50 TIMESPOINTS

More from our Partners

Loading next story
Business News › News › International › US News › Reddit locks out Wayback machine to stop AI from scraping old posts
Text Size:AAA
Success
This article has been saved

*

+