off-topic 2023-01-14 | Slack Archive

respatialized16:01:07

Matthew Butterick is quite busy these days! Class-action litigation has now been launched against AI image generators on behalf of artists whose work and style are being pirated by generative AI, both open source and proprietary. https://stablediffusionlitigation.com/

moe16:01:56

That's pathetic

respatialized16:01:35

I would strongly recommend you read through the complaint. It is extremely well-researched and grounded in the relevant scientific literature about how these models actually work.

moe16:01:33

I get the licensing complaint, but I don't understand how out-of-sample images can be infringing

respatialized16:01:30

By "out of sample," do you mean "new" images not present in the training set?

moe16:01:29

i do

moe16:01:23

Just because I post it every chance I get, here is the most uncanny and unrelated-to-the-prompt output I got from stable diffusion

respatialized16:01:43

Section VIII of https://stablediffusionlitigation.com/pdf/00201/1-1-stable-diffusion-complaint.pdf goes into detail about how they're framing the argument on a technical level: > Because a trained diffusion model can produce a copy of any of its Training Images—which could number in the billions—the diffusion model can be considered an alternative way of storing a copy of those images. In essence, it’s similar to having a directory on your computer of billions of JPEG image files. But the diffusion model uses statistical and mathematical methods to store these images in an even more efficient and compressed manner. It thus concludes: > The resulting image is necessarily a derivative work, because it is generated exclusively from a combination of the conditioning data and the latent images, all of which are copies of copyrighted images. It is, in short, a 21st-century collage tool.

respatialized16:01:44

whether a human observer would subjectively consider them "novel" is irrelevant to the question of the relationship between the training data and the output. I think it's a compelling argument.

moe16:01:48

Thanks for excerpting that for me, that clears things up a bit

moe16:01:04

I do hope that if I were a visual artist I wouldn't have this reaction to the technology, though

👍 2

moe16:01:32

And not knowing very much about ML, I wonder if you could determine whether a model incorporated a given image by looking at the model itself

moe16:01:15

It seems intuitive that you couldn't do this with certainty

respatialized16:01:16

the complaint also addresses this. the source of the images is, in fact, known, so it is harder to claim "stochastic ignorance" of where an image came from. > The LAION-Aesthetics dataset is heavily reliant on scraping and copying images from commercial image-hosting services: according to one study, 47% of the images in the dataset were scraped from only 100 web domains. > ... > When asked whether he sought consent from the creators of the Training Images, Holz said “No. There isn’t really a way to get a hundred million images and know where they’re coming from. . . . There’s no way to find a picture on the internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it.” > Holz’s statement is false. LAION and other open datasets are simply lists of URLs on the public web. Many of those URLs are derived from a small handful of websites that maintain records of image ownership. Thus, many images could be traced to their owner. Holz and LAION possess information sufficient to perform such tracing.

moe16:01:17

ah yeah, I mean if you went the plausible deniability route post-facto

respatialized16:01:36

as an amateur artist soon on my way to a drawing class and a ML professional in my day job, I am wondering what you think an appropriate reaction to this technology is.

moe16:01:15

I'm also an amateur artist, but immature enough to not think it worth mentioning earlier,, and I have used stablediffusion to help with abstract composition ideas. all of my software is released into the public domain, i have difficulty with the idea of intellectual property, and i hope that wouldn't change if i were digitising my graphite drawings

respatialized16:01:17

I think it's important to note that not everyone has the luxury of doing that with the craft they spent years developing and earn a livelihood from. I don't think any of these litigants would have a problem with a tool like Stable Diffusion trained on work in the public domain or available under a CC license. It's the "work in the style of [insert commercial artist here]" ripoffs that truly threaten their livelihoods. I am not thrilled about copyright as a general concept either, but when a company comes along that threatens to wipe out what little commercial work is left for actual artists by blatantly violating their rights, I am not going to hold it against them for trying to protect their interests. I think it's important on a precedent-setting basis as well. Along with the similar litigation against GitHub and OpenAI's Codex about Copilot, it is one of the first major challenges to the "fair use" assertions of AI + ML companies, which has to my knowledge never been tested in court. As such, it is an important test for whether the courts can serve as one avenue for regulating and shaping the direction of AI (as opposed to letting Silicon Valley do whatever the fuck they think is best).

💜 10

moe16:01:56

thanks for the measured response. i think there's enough fetishization of human creativity in the art market that i find it hard to imagine livelihoods being threatened by stablediffusion as it stands now, but i guess there's more than one "art market", and if people say their livelihoods are being threatened, then i'm in no position to dispute that. you have clearly thought about this more deeply than i have, so i don't want to drag you into a convoluted conversation, but depending on precedent I see this becoming something like performance-enhancing drugs, with artists using black market copyright-infringing models to gain a competitive advantage

respatialized17:01:51

I'm not under any illusions that the current copyright regime has been super great for artists either (e.g. private equity portfolios buying up and relentlessly pushing a few musicians' catalogues while new bands struggle to get paid decently for recording and releasing albums). But I view that as a problem with concentration and market power, and I see business models like MidJourney's as very much accelerating that trend towards concentration. If these companies that talk a good game about "democratizing the benefits of AI" want to put their money where their mouth is, paying the people responsible for creating their models' value would be a great place to start.

➕ 4

moe17:01:26

can't really argue with that

mauricio.szabo18:01:30

I think the problem relies on what "fair usage" is. Anyone can join ChatGPT and ask questions, but for Copilot you have to pay. Is it "fair usage" to scrap public code that sometimes have incompatible license and use it on a closed, paid, system? The same for art. Sure, generating something from these models and using it to illustrate a public, free book is something. Generating an image "on the same style of this paid artist", then using on a paid material, is it "fair usage"?

mauricio.szabo18:01:58

I think the problem is - morally, I don't feel this is fair. At the same time, I don't know the answer to that. Specially because some people can produce amazing pieces of art with these models (not me - it always gives me back nightmare-ish pictures)

quoll21:01:09

I thought the problem was neatly displayed with the popularized results of asking Midjourney for an “afghan girl” https://i.mj.run/744ce4a2-5b65-4f20-a848-e50906125a3b/grid_0.webp

➕ 6

Martynas Maciulevičius22:01:46

If it's scrambled in our database it has to be different, right? Right? Right.....? 🦗 🦗 🦗

adi03:01:11

excerpt of excerpt of litigation https://clojurians.slack.com/archives/C03RZGPG3/p1673714236341359?thread_ts=1673712367.388589&cid=C03RZGPG3: > There’s no way to find a picture on the internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it. Ted Nelson is peering at us and he isn't smiling...

✔️ 2

2023-01-14

Channels