Idiomdrottning’s homepage

My Butlerian hypocrisy

In the Butlerian Jihad (from Dune but popularized by many smolnet posters like Alex Schroeder) we rightly hate bots and scrapers but I’m in a bit of a glass house around that, since I’ve made a few scrapers for my own personal use as a way to get RSS Atom feeds out of sites that don’t have feeds. I love scraping and mashing.♥︎ The JS-laden SPA era was a nightmare for me. I hate browsers and server-side styling. I love getting texts from URLs.

Follow-ups

An Inhabitant in Carcosa responds:

Bad in intent: it is intended to do something unethical, whether that be LLM training, denial of service, privatizing the commons, or immanentizing the eschaton. This is pretty subjective in an “I know it when I see it” kind of way. Scraping for a search index, scraping for a full-text RSS feed, and scraping for LLM training are all the same act as far as the server can tell, but only the last one is /evil/.

Having a full-text RSS feed as a way to not have to deal with ads or paywalls—even when the reasons to not be able to otherwise handle ads and paywalls are 100% a11y issues—goes against the intent of the server owners.

And I’m not so sure LLMs are evil.

It may ignore robots.txt, it may lie about being another user-agent

Have done both those too!

Either bad intent or bad implementation is enough; a bot doesn’t need both to be bad.

That’s not exactly my philosophy.

I love the open readable simple web where each document has one URL and you can read it on your own terms. I can’t deal with the junk web.