Searched for

AI CONTENT SCRAPING

Personal data isn't collected: MHA on security agencies using open-source intelligence from public sources
Security agencies are utilizing open-source intelligence from public sources like social media for information gathering, assuring no priva...
31 Mar, 2026, 06.07 PM IST
How media industry wants to regulate GenAI training
Amid rising concerns, India's media industry is highlighting the risks of unregulated AI training exploiting copyrighted works. In response...
13 Dec, 2025, 06.28 AM IST
New York Times sues Perplexity AI for 'illegal' copying of content
The New York Times has filed a lawsuit against AI startup Perplexity. The Times claims Perplexity copied and used its articles without perm...
06 Dec, 2025, 12.43 AM IST

- AI bots traffic has surged 300%, is disrupting online business: Akamai report
  AI bots have surged 300% in a year, disrupting online operations, Akamai’s 2025 report shows. These bots, driven by content scraping, now d...
  05 Nov, 2025, 04.13 PM IST
- Reddit accuses 'data scraper' companies of stealing its information
  Reddit is taking a firm stand against four data scraping companies, including SerpApi, Oxylabs, and AWMProxy, by initiating legal proceedin...
  23 Oct, 2025, 12.21 AM IST
- Reddit locks out Wayback machine to stop AI from scraping old posts
  Reddit has restricted the Internet Archive’s Wayback Machine from extensively capturing its content due to concerns over unauthorized AI da...
  12 Aug, 2025, 01.57 AM IST
- Cloudflare launches tool to help website owners monetise AI bot crawler access
  The tool allows website owners to choose whether artificial intelligence crawlers can access their material and set a price for access thro...
  01 Jul, 2025, 03.41 PM IST
- Reddit sues AI startup Anthropic for allegedly using data without permission
  According to the complaint, Anthropic has resisted entering a licensing agreement even as it trained its Claude chatbot on Reddit content, ...
  05 Jun, 2025, 08.41 AM IST
- Wikimedia Just Dropped a Massive Wikipedia Dataset on Kaggle — A Bold Move to Stop AI Bots From Scraping
  The beta dataset is being hosted on Google-owned Kaggle. The dataset features 'structured Wikipedia content in English and French', the Wik...
  17 Apr, 2025, 08.20 PM IST
- Companies alert as along come AI web spiders
  AI crawlers are computer programs that collect data from websites to train large language models. Enterprises are increasingly blocking AI ...
  15 Dec, 2024, 06.00 AM IST
- Canadian news media are suing OpenAI for copyright infringement, but will they win?
  The lawsuits claim that OpenAI "scraped" large amounts of content from media sites without permission. They have also claimed that the AI c...
  02 Dec, 2024, 01.39 PM IST
- NYT sends AI startup Perplexity 'cease and desist' notice over content use
  Since the introduction of ChatGPT, publishers have been raising the alarm on chatbots which can comb the internet to find information and c...
  16 Oct, 2024, 08.47 AM IST
- ET Explainer: Cloudflare's new tool aims to block AI bots from scraping website content
  Cloudflare has introduced a new tool to block AI bots from scraping website content. The tool aims to protect content publishers from unaut...
  09 Jul, 2024, 05.40 PM IST
- AI bots taking over the Internet? Here's how companies are stopping this intrusion
  Artificial intelligence' rise has created some nasty problems for text-based websites, some of whom are complaining that the performance of...
  08 Jul, 2024, 10.36 PM IST
- Amazon is reviewing whether Perplexity AI improperly scraped online content
  Perplexity AI, an AI startup, is under Amazon's review for scraping content. The company faces allegations of plagiarism and generating fak...
  29 Jun, 2024, 10.15 AM IST
- Reddit to update web standard to block automated website scraping
  AI startups face scrutiny for bypassing Reddit's updated scraping rules. Plagiarism accusations against firms like Perplexity highlight the...
  26 Jun, 2024, 09.17 AM IST
- Multiple AI companies bypassing web standard to scrape publisher sites, licensing firm says
  Perplexity likely bypassed web crawler blocks via the Robots Exclusion Protocol, as reported by Wired, using analytics to track AI traffic.
  21 Jun, 2024, 08.47 PM IST
- It’s an important case; we should stay tuned: MoS IT on NYT lawsuit against OpenAI, Microsoft
  Last Wednesday, New York Times sued OpenAI and Microsoft over copyright infringement, alleging that millions of its articles were used with...
  01 Jan, 2024, 08.04 PM IST
- 'Not for machines to harvest': data revolts break out against AI
  At the heart of the rebellions is a newfound understanding that online information - stories, artwork, news articles, message board posts a...
  16 Jul, 2023, 10.41 PM IST
- Google training Bard with scraped web data? Here’s everything you may want to know
  Google has acknowledged training its AI systems using publicly available web data, prompting concerns over privacy, copyright infringement ...
  06 Jul, 2023, 02.42 AM IST
Load More