In the Butlerian Jihad (from Dune but popularized by many smolnet posters like Alex Schroeder) we rightly hate bots and scrapers but I’m in a bit of a glass house around that, since I’ve made a few scrapers for my own personal use as a way to get RSS Atom feeds out of sites that don’t have feeds. I love scraping and mashing.♥︎ The JS-laden SPA era was a nightmare for me. I hate browsers and server-side styling. I love getting texts from URLs.
An Inhabitant in Carcosa responds:
Bad in intent: it is intended to do something unethical, whether that be LLM training, denial of service, privatizing the commons, or immanentizing the eschaton. This is pretty subjective in an “I know it when I see it” kind of way. Scraping for a search index, scraping for a full-text RSS feed, and scraping for LLM training are all the same act as far as the server can tell, but only the last one is /evil/.
Having a full-text RSS feed as a way to not have to deal with ads or paywalls—even when the reasons to not be able to otherwise handle ads and paywalls are 100% a11y issues—goes against the intent of the server owners.
And I’m not so sure LLMs are evil.
It may ignore robots.txt, it may lie about being another user-agent
Have done both those too!
Either bad intent or bad implementation is enough; a bot doesn’t need both to be bad.
That’s not exactly my philosophy.
I love the open readable simple web where each document has one URL and you can read it on your own terms. I can’t deal with the junk web.