comic-snarfer --start-page=[URL] --image-path=[XPATH] --next-path=[XPATH] --for-real
So I’m trying to release more of the stuff I write even if it’s somewhat, uh, “bespoke” stuff. (“Bespoke” is best backhanded compliment!)
This is a snarfer that trawls through a series of web pages. It saves images from them and then finds the link to the next page and recurses from there. It rips web comics, pretty much. It could also snarf other media (including just normal html pages) because it uses xpath to dispatch, not file endings.
It assumes you’re making an implicit “dry run” until you supply the
argument --for-real
. My advice is to hold off on that until the
output for the first page looks right to you.
It’ll download directly to your current working directory, so make sure you are in a good clean empty place that you can fill with images.
I usally snarf to a directory, back it up, clean the names up with
perl’s rename
script, do zip ../some-name.cbz *
, and remove the
image directory and its backup. The backup step is only because I have
mabla up the perl expression too many times…
mcomix
is the reader I like. With it, the renaming and zipping is
optional, it can handle directories.
There are three required options.
--start-page=URL
--image-path=XPATH
--next-path=XPATH
There are two non-required options.
--start-issue=NUMBER
--start-issue
number,
but defaults at zero which is what you want most of the time. The
point of this option is in case the snarfer crashed or was
terminated and you want to resume with the same numbering.--for-real
git clone https://idiomdrottning.org/comic-snarfer