Consider https://arstechnica.com/robots.txt or https://www.nytimes.com/robots.txt and how they block all the stupid AI models from being able to scrape for free.

  • henfredemars@infosec.pub
    ·
    7 months ago

    Such a measure merely punishes entities that respect the rules. If the content can be accessed, it will be scraped and used to train AI.