If you missed Awoo's thread, imgur is deleting all anonymously uploaded images on may 15th. This mostly likely includes the majority of image content posted to leftist subreddits.

If you want to save any imgur content from a subreddit now is your last chance.

This is the script I am using with contributions from u/captcha

Using it requires you to download a pushshift archive of the subreddit you want to extract images from. For content up until 2022 this is easiest to obtain from redarcs.

Install dependencies for the script with

pip install argparse asyncio httpx zstandard json

Then run the script with the first argument being a path to the .zst file that you have downloaded. Optionally you can limit the script to only download from a single domain, which is useful given the limited time frame to download from imgur specifically. To do this add the arguments --domain <domain>

Some other options

--subreddit <subreddit name> If you get a zst file from the pullshift torrent archives you can use this to filter to only the subreddits you want

--concurrency <integer> Max number of requests to run concurrently

-enable_retries After attempting to download every image on the subreddit, reattempt to download any that failed to download

--max_retries <integer> Maximum number of times to reattempt downloading files before giving up

--retry_hesitate <integer> when reattempting downloads wait x seconds first, use this if rate limited

--retry_cooldown <integer> how long in minutes it will wait between each set of redownload attempts. Use this to try again considerably later if hosts are unreliable. Good to run during offpeak.

-enable_proxy I added this to toggle wireguard proxies to work around potential ip blocks. don't use this unless you know what you're doing.

example:

python script.py Chapotraphouse_submissions.zst --domain i.imgur.com -enable_retries

  • Abraxiel
    ·
    2 years ago

    And yet image hosting is essential. Seems like a natural public utility.