I'm working on a project that involves gathering lots of data from a website for later analysis. It's a crank conspiracy site, most likely populated by boomers, but I'm still wary about getting caught and having my IP blocked as it's not possible for me to use a VPN. Ideally I want to visit the site every 10 minutes or so. Is there anything I can do, other than varying the re-visit time in the script, to avoid detection? Also, how likely is detection in the first place? Is that something site admins are likely to check?

  • borisjohnson [he/him]
    hexagon
    ·
    4 years ago

    HTTP requests in python scheduled via cron on a remote server. I’ll check out the links, looks like they might be useful