Do you think AI "things" like Midjourney or ChatGPT will have or are already having some kind of "piracy" around them?

incognito08@lemmy.dbzer0.com · 19 hours ago

Do you think AI "things" like Midjourney or ChatGPT will have or are already having some kind of "piracy" around them?

Aceticon@lemmy.dbzer0.com · edit-2 14 hours ago

I'm pretty sure those things are trained on content which was obtained without paying royalties to the creators, hence by definition pirated content - so that would count as "piracy around them".

On the opposite side, as far as I know the things created with Generative AI so far can't be copyrighted, hence by definition can't be pirated as they've always belonged to the Public Domain.

As for the engines themselves, there are good fully open source options out there which can be locally installed (if you have enough memory in your graphics card) and there seem to be thriving communities around it (at least it looks like it from what bit I dipped into that stuff so far). I'm not sure if it's at all possible to pirate the closed source engines since I expect those things are designed to be deployed to very specific server farm architectures.

JohnBrownsBussy2 [she/her, they/them] · 19 hours ago

Weight leaks for semi-open models have been fairly common in the past. Meta's LLaMa1.0 model was originally closed source, but the weights were leaked and spread pretty rapidly (effectively laundered through finetunes and merges), leading to Meta embracing quasi-open source post-hoc. Similarly, most of the anime-style Stable Diffusion 1.5 models were based on NovelAI's custom finetune, and the weights were similarly laundered and became ubiquitous.

Those incidents were both in 2023. Aside from some of the biggest players (OpenAI, Google, Anthropic, and I guess Apple kinda), open weight releases (usually not open source) have been become the norm (even for frontier models like DeepSeek-V3, Qwen 2.5 and Llama 3.1), so piracy in that case is moot (although it's easy to assume that use non-compliant with licenses is also ubiquitous). Leakage of currently closed frontier models would be interesting from an academic and journalistic perspective, for being able to dig into the architecture and assess things like safety and regurgitation outside of the online service shell, but those frontier models would require so much compute that they'd be unusable by individual actors.