Consider https://arstechnica.com/robots.txt or https://www.nytimes.com/robots.txt and how they block all the stupid AI models from being able to scrape for free.

  • CarbonScored [any]
    ·
    edit-2
    7 months ago

    It's not about relying on it, it's about changing the behaviour of web crawlers that respect 'em, which, as someone who has adminned a couple scarily popular sites over the years, is a surprisingly high percentage of them.

    If someone wants to get around it, they obviously can, but this is true of basically all protective measures ever. Doesn't make them pointless.