Hi folx
Not much has changed since we last brought this up half a year ago, which is probably a mistake as link trackers have become more ubiquitous, and the corporations that know our names and addresses have built up shadow profiles on us, but better late than never.
Anyway, cutting to the chase. This bot will warn you in DMs when you share a tracking link. That's it. Post over.
Read on if you want to see my unhinged tracking link rants.
What are link trackers?
When you share a youtube link you may notice an ?si=(random gibberish) at the end. You may notice the same with Instagram, except here it's ?igshid. On Twitter, it's ?t. On TikTok and Reddit you have urls that end in gibberish like vm.tiktok․com/blahblah or reddit․com/r/blahblah/s/blahblah.
These URLs are artisanal. They are made only for you.
Other site's URLs can also be called "high entropy" URLs, for example, they may contain the time down to the millisecond, in one case.
When you share these URLs to the world wide web, you broadcast to this service (to YouTube, to Google, to TikTok, to Reddit etc.) that "Hey! This previously-anonymous account is actually me!". When you share this link to your friend halfway across the world who only talks to you on Discord, and they click it, you broadcast to this service that actually you two are buddies. Same here on Hexbear. This sharing helps these sites build a social graph on us.
The threat is two-fold. Google has a powerful search crawler, and also runs a massive ad network. They could sift through the pages they indexed on Hexbear and link the exact Hexbear account to your real name. People who have clicked on your shared link will also be exposed as having been on that exact page to which you shared the link. This kind of metadata leak can be dangerous, as law enforcement has previously asked Google to reveal people who watched so-and-so YouTube video at so-and-so time.
This bot also handles TikTok, Yandex, Snapchat, Meta/Facebook trackers that all have this same ad-related threat.
What can mods on Hexbear do?
If you're a mod and you think this is important, you can @ mention this bot on a community you moderate. The bot should reply to you with some cringe, and then you can appoint it as a mod. When given mod powers, it will remove any comment/post that contains tracking links if the user has not fixed it after a day.
I will probably add functionality to sift through old comments that have dangerous trackers (like TikTok, which exposes your name and picture to anyone who clicks it) and remove/report them soon.
How to protect yourself on other sites and on your phone
Install the ClearUrls extension on desktop (if you're on Chrome... please switch, that is another privacy issue entirely). ClearUrls will cut down on most of your worries.
Be on the lookout for the high-entropy parameters when you share things on your phone as well. Parameters in the url that look like ?si=blahblah, ?igshid, which look like they'd stand for "share ID" or "Instagram share ID", as well as obfuscated TikTok links like vm.tiktok․com/blahblah will all track you and your social circle.
How to protect your identity from leakage if you accidentally click on a tracking URL
If you're browsing a sensitive website, like Hexbear, and you happen to click a tracking URL that goes to YouTube, Google/YouTube can correlate your click with the appearance of this URL on Hexbear, associating your identity with this site.
To avoid this, you may use Firefox Multi-Account Containers, and make Hexbear use its own container that you keep separate from everything else. Although this solution is not perfect, it will prevent one facet of your identity leaking and make it harder for other sites to correlate your digital footprint.
What other threats exist hidden in URLs
The biggest threat is TikTok, which basically doxxes you when you share a link with someone.
When someone clicks your TikTok link, a big banner on top of their screen shows your profile picture and your name. If you used your real name and picture... well. Uh-oh.
Other "light doxxing hazards" exist on other sites. After looking through Hexbear comments using the search function, you can find comments that link to *****, comments that link to ****, etc. that may include the user's general location down to the city, their preferred language, their screen width and height (in the URL!!! for some reason???), and some very high-entropy parameters that look like a long string of gibberish.
If I sat down today and looking to dox someone by looking at their profile and they shared links willy-nilly, I'd have some pretty good leads.
What can the maintainer of HexReplyBot do?
HexReplyBot does not handle YouTube tracking parameters properly. The maintainer can check this RegExr post I made with the modified regex. I bodged it real quick, but it should remove the ?si at least. It will still keep the ?pp parameter, but I got lazy and it's not as common. Please consider changing the regex out, thank you.
Some links
https://archive.ph/8c80m - law enforcement using metadata provided by YouTube to find the real name of a suspect
https://hexbear.net/comment/4439859 - someone mentioning that they keep getting a Hexbear user recommended to them on TikTok because they clicked that user's TikTok link months ago
https://archive.is/WD7ke - "We kill people based on metadata"
Can't be bothered to find it but ross ulbricht got busted on some metadata links between his email and stackoverflow. Now imagine if they had tracking links back then to triangulate his stackoverflow identity (which now has tracking links) with some other offsite identity.
Share any feedback or thoughts, I'll take it into consideration.
Regex is scary. I view it like the dark side of the force. I did a bunch of work taking "the length of the phrase after the comma until the next ' ' in the string" instead of trying to decipher regex.
I've never seen it spelled out like that. It might change everything for me on a rainy day. Thank you!
Tracking TikTok links should be completely banned, huge security risk. Admin(s?) can you please consider it?
if they can be detected using a simple regex (no lookaheads/behinds iirc) I think the slur filter can remove them.
Good bot, even if it's a little annoying. It'd be cooler if Lemmy had an integrated auto-replace tool for this.
There is an issue open for this exact thing: https://github.com/LemmyNet/lemmy/issues/4905
CleanURLs provides repo of rules: https://github.com/ClearURLs/Rules that can be used for this task.
I'm working my way through the Rust book to learn more about how Rust works so I can add this into the server backend of Lemmy once I feel confident.
Will be following that issue closely! Thanks for sharing
Damn, this was on my list of things to do, lol. I have to ask, is there a Hexbear Coding Collective, maybe a publicly hosted Gitea server or even just a Github Organization? I would love to contribute to these projects, and help give back to the Hexbear tech infrastructure in some practical way.
Also, I just want to point out that there is a repo of tracking rules independent of CleanURLs that can be used for this task.
https://github.com/ClearURLs/Rules
That way, you do not have to come up with these regexes yourself.
Curious to hear input from people who have mod powers on this. What makes the most sense? To mod this bot, or to have the bot point out posts and comments, maybe have the bot reply to tracking links publicly... or something else entirely?
Superb post, ty. Does InsidiousTrackers not provide a comment removal reason though? Just checked the modlog and I think it should show "reason: tracking" or smth.
Just thought it was weird seeing a buncha
removed comment
without a "reason" field =)
Can hexbear be made to automatically strip tracking from links?
So if I give someone a tiktok link even if I delete the ? and everything past it, do I still get doxxed? I just have this feeling like I do.
You just want to avoid sharing the URLs that are formatted
www.ticktok.com/t/abcd12rfd
, as well as the ones that have a?
in them. See my post: On Sharing TikTok Videos.I will check this out. That makes sense and is crystal clear. Thank you!
if the link is just www.tiktok.com/@username/video/numbers without a ? the url doesn’t have tracking. if it’s the vm.tiktok ones can either open in a browser to get www. version or use a site like urlex.org to get the www. link then remove the ? and everything after. (also works on the tracking reddit links)
I found a YouTube link in your comment. Here are links to the same video on alternative frontends that protect your privacy:
I encourage everyone to check this android app that helps cleaning links
https://f-droid.org/packages/com.trianguloy.urlchecker/
Some people here are terrible with link hygiene, but people irl are so much worse
Please don't have a bot DM me. That's very annoying.
I get that there's some technical challenges that would need to be solved for those tracking links to be stripped out by lemmy itself but I'm just annoyed by a saying uhhhh excuse me you used a youtube link instead of some dodgy YouTube frontend that will disappear in a year, or how dare you use a twitter link instead of some twitter proxy that will shut down in 3 months.
Like why not spend the coding time actually fixing the issue instead of just annoying people
It's not about a frontend, we have that already in HexReplyBot. It's about removing a parameter tacked on at the end in your YouTube/Instagram/TikTok links.
It's the same technique. A bot that just replies with 'oh sweaty you should have done something different'
One complains about YouTube or X links and other complains about tracking info in URLs.
It's annoying and instead there should be code in Lemmy that does the URL sanitizing
I agree with you. Seems like there's an active issue in Lemmy for this, so it'll get implemented eventually. Then it'll take a while for Hexbear to update.
Sharing the wrong TikTok link can and has doxed users before, and some users have unsafe living situations if they got doxed.
uhhhh excuse me you used a youtube link instead of some dodgy YouTube frontend that will disappear in a year
You fundamentally do not understand what this post is talking about.
Comrade, operational security is what this post is about, not anticapitalist frontends for media services. TikTok will dox you if you are not careful.