Archiving imgur posts from subreddits before the purge

wwiehtnioj [none/use name] · edit-2 2 years ago

Archiving imgur posts from subreddits before the purge

flan [they/them] · 2 years ago

deleted by creator

Abraxiel · 2 years ago

And yet image hosting is essential. Seems like a natural public utility.

tamagotchicowboy [he/him] · 2 years ago

I'm going to miss all those DIY and what is this thing and how to in the future

captcha [any] · 2 years ago

    async def defer():
        await get_image(http_client, post)
        con_lock.release()

    await con_lock.acquire()
    create_task(defer())
                    
    if not os.path.isfile(post["file_name"]):
        undownloaded_images.append(post)
        break

I think that last if not block should be in the defer function. create_task forks the defer() Coroutine to the background and continues on. So you now have a race condition between when the get_image coroutine downloads the file and when the main coroutine checks if the file exists. The main coroutine should always win that so your script should be marking every file as not downloaded and probably never exits.

By putting the check in the defer coroutine you can ensure that the download has finished before you check if the file exists.

Also dont try awaiting the task created as that would work but mean you're waiting for the first request to finish before starting the next.

captcha [any] · 2 years ago

Brief summary of how asyncio works:

async def means the function returns a Coroutine which is code that can be executed. If you look into its type its based off Generator
await runs a coroutine and stops the current code you are executing until the coroutine is finished.
create_task runs a coroutine but does not block until it finishes. It does return a Task which you can use to check on the progress or cancel it etc.