Using AI to detect child porn; Why is this even a moral debate?

LeylaLove [she/her, love/loves] · edit-2 1 year ago

Using AI to detect child porn; Why is this even a moral debate?

LanyrdSkynrd [comrade/them, any] · 1 year ago

Google already has an ML model that detects this stuff, and they use it to scan everyone's private Google photos.

https://www.eff.org/deeplinks/2022/08/googles-scans-private-photos-led-false-accusations-child-abuse

The must have classified and used a bunch of child porn to train the model and I have no problem with that, it's not generating new CP or abusing anyone. I'm more uncomfortable with them running all our photos through an AI model and sending the results to the US government and not telling the public.

WayeeCool [comrade/them] · edit-2 1 year ago

They just run it on photos stored on their servers. Microsoft, Apple, Amazon, and Dropbox also do the same. There are also employees in their security departments with the fkd up job of having to verify anything flagged then alert law enforcement.

Everyone always forgets that "cloud storage" means files are stored on someone else's machine. I don't think anyone, even soulless companies like Google or Microsoft want to be hosting CSAM. So it is understandable that they scan the contents of Google Photos or Microsoft OneDrive, even if they didn't have a legal obligation there is a moral one.

chickentendrils [any, comrade/them] · 1 year ago

Seems pretty cut and dry to me. As a tool for moderators to verify, rather than an unwilling witness having to report it.

LeylaLove [she/her, love/loves] · 1 year ago

Exactly. Like why is this an "ethical question"?

has_com [he/him] · 1 year ago

Actually kenyan workers HAVE TO be traumatized for CSA to end

HornyOnMain · 1 year ago

196 and anti anti CP takes, name a more iconic duo

kristina [she/her] · edit-2 1 year ago

when they talk about this, there are identifiers that detect it and remove it automatically, you arent actually storing it in any way. this is standard operation for any major website.

yall really need to stop reading 'AI' and having your brains shut off in general, not really referring to this case just in general

LeylaLove [she/her, love/loves] · 1 year ago

Hash lists exist yeah. But American law actually requires website hosts to keep the CP for evidence instead of deleting it. It's why DivideBy0's tool isn't supposed to be used for American Lemmy instances. Like if you upload a flagged image to Google drive, Google is supposed to flag it, save it, and call the cops.

kristina [she/her] · edit-2 1 year ago

i get that its supposed to be for evidence but its really fucked up to have to put small time server owners through that shit, terrible law. got to be some other way to handle that

LeylaLove [she/her, love/loves] · 1 year ago

I agree. The DivideBy0 tool should be standard on here. Instantly deleting it when its uploaded and saving post ip is the best solution. More just explaining that anybody in the position to make a tool like that wouldn't have to go out of their way to get source material because legally speaking, they should already have some. There are site hosts that ignore this law and just delete and ban instantly (as they should), but I think it's important to explain why these tech companies just happen to have large repositories of CP to train AI on.

kristina [she/her] · edit-2 1 year ago

hexbear doesnt log ip at all afaik, security risk

InfiniteGlitch@lemmy.dbzer0.com · 1 year ago

Isn’t that a good thing? Quicker to find it, remove it and hopefully find the one who’s spreading it and sent them to prison.

LeylaLove [she/her, love/loves] · 1 year ago

It is a great thing, hence why I'm posting this. Why the fuck is there anybody thinking about the moral implications of using AI to handle CP? What moral implications? What's wrong with it?

drhead [he/him] · 1 year ago

Bottom comment is technically correct, you can bypass any dataset-related ethical concerns. You could very easily make a classifier model that just tries to find age and combine it with one of many existing NSFW classifiers, flagging any image that scores high for both.

But there are already organizations which have large databases of CSAM that are suitable for this purpose, using it to train a model would not create any additional demand, it would not result in any more harm, and it would likely result in a better model. Keep in mind that these same organizations already use these images in the form of a perceptual hash database that social media and adult content sites can check uploaded images against to see if they are known images while also not sharing the images themselves. This is just a different use of the data for a similar purpose.

The only actual problem I could think of would be if people trust its outputs blindly instead of manually reviewing images and start reporting everything that scores high directly to the police, but that is more of a problem with inappropriate use of the model than it is with the training or existence of the model. It's very safe to respond like this to images flagged by NCMEC's phash database because those are known images and if any false positives happen they have the originals to compare to so they can be cleared up, but even if you get a 99% accurate classifier model, you will still have something that is orders of magnitude more prone to false positives than the phash database, and it can be very difficult to find out why it generates false positives and correct the problem, because... well, it involves extensive auditing of the dataset. I don't think this is enough reason to not make such a model, but it is a reason to tread carefully when looking at how to apply it.

Frank [he/him, he/him] · 1 year ago

Usually when people do this it's with, like, Mengele or Unit 731s "research", except their research was almost entirely insane sadism with at most a veneer of science. Whereas this is really cut and dry - LLMs can be trained to recognize patterns, you've got ready access to a training dataset (Or the FBI does, or whatever), you can train your LLM to flag and remove CSAM.

LeylaLove [she/her, love/loves] · 1 year ago

Yeah. I'm not justifying people defending Mengele or Unit 731, but there's at least enough there that you can at least understand why people came to this conclusion. Plus, westerners pardoned Unit 731 and similar German scientists for the exact reason of "let's get the data". While they found out the data wasn't useful, the west also had to stick by their decisions to not seem absolutely insane. There has been a propaganda push, along with the west's poorly developed utilitarian view, that makes defending Unit 731 understandable. Defending Unit 731 really isn't that much of a jump from defending capitalists putting workers in many of the same horrific deaths through workplace austerity. If it's okay to cook people to death because you wanted to make a few bucks, what's the issue with freezing someone to death? Not okay mind you, but I can clearly see the route the brainworms took.

This though? Like what do people expect? There is no good reason to not want AI on CP enforcement. Just because "there's a database of CP"? What do people want LE to do with evidence of abuse? I WISH the videos of my abuse were in some Fed's database. Instead my abuser walked off and now I just get the occasional creepy message talking about how hot my abuse was. Tracking pedophiles is pretty much the only consistently good thing the feds do. Why are people criticizing THIS of all things? There are no grounds to make this a real moral debate

WhatDoYouMeanPodcast [comrade/them] · 1 year ago

Only moral quandary that comes to mind is that if you give an AI a bunch of CSAM then it leaks then 1) someone has it 2) AI gets better at generating it. Who's training AI? Who verifies it? Security protocols?

drhead [he/him] · 1 year ago

This would be a classifier model, incapable of making images. Most classifier models only output a dictionary of single floating point values for each class it is trained on representing the chance the image belongs to that class. It'd probably be trained by an organization like NCMEC, possibly working with a very well trusted AI firm. For verifying it, usually you reserve a modest representative sample of the database and don't train on it, then use that to determine how accurate the model is and to decide on what score threshold is appropriate for flagging.