I'm working on a project that involves gathering lots of data from a website for later analysis. It's a crank conspiracy site, most likely populated by boomers, but I'm still wary about getting caught and having my IP blocked as it's not possible for me to use a VPN. Ideally I want to visit the site every 10 minutes or so. Is there anything I can do, other than varying the re-visit time in the script, to avoid detection? Also, how likely is detection in the first place? Is that something site admins are likely to check?
This is a good idea, I didn't consider user-agent spoofing. Thanks for this
HTTP requests in python scheduled via cron on a remote server. I’ll check out the links, looks like they might be useful
a crank conspiracy site, most likely populated by boomers
Seems you are crawling chapo dot chat.
just put in the normal user agent and cookies from your browser that helps against most shitty scrap detectors