It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn't.
By "spy" I mean things like: know how many times I've read the PDF, when I've opened it, which parts of it I've read most, what program I used to open the PDF, how many copies of the PDF I've made, how many people I've emailed it to, etc. etc. etc.
This technique can do none of that. The only thing it can do is: if someone uploads the PDF to a mass sharing network, and an employee of the publisher downloads it from that mass sharing network and compares this metadata with the internal database, then they can see which of their users originally downloaded it and when they originally downloaded the PDF. It tells them nothing about how it got there. Maybe the original user shared it with 20 of their colleagues (a legitimate use of a downloaded PDF), and one of those colleagues uploaded that file to the mass sharing site without telling the original downloader. It doesn't prove one way or the other. It's an extremely small amount of information that's only useful for catching systemic uploaders, e.g. a single user who has uploaded hundreds or thousands of PDFs that they downloaded from the publisher using the same account.
And a savvy user can always strip that metadata out.
As a reminder, ...
All true, and fucked up, but it's not related to what I was talking about. I was talking about the general use of this technique.
By "spy" I mean things like: know how many times I've read the PDF, when I've opened it, which parts of it I've read most, what program I used to open the PDF, how many copies of the PDF I've made, how many people I've emailed it to, etc. etc. etc.
This technique can do none of that. The only thing it can do is: if someone uploads the PDF to a mass sharing network, and an employee of the publisher downloads it from that mass sharing network and compares this metadata with the internal database, then they can see which of their users originally downloaded it and when they originally downloaded the PDF. It tells them nothing about how it got there. Maybe the original user shared it with 20 of their colleagues (a legitimate use of a downloaded PDF), and one of those colleagues uploaded that file to the mass sharing site without telling the original downloader. It doesn't prove one way or the other. It's an extremely small amount of information that's only useful for catching systemic uploaders, e.g. a single user who has uploaded hundreds or thousands of PDFs that they downloaded from the publisher using the same account.
And a savvy user can always strip that metadata out.
All true, and fucked up, but it's not related to what I was talking about. I was talking about the general use of this technique.