
Stable Diffusion 3 Mangles Human Bodies Due To Nudity Filters (arstechnica.com) 88
An anonymous reader quotes a report from Ars Technica: On Wednesday, Stability AI released weights for Stable Diffusion 3 Medium, an AI image-synthesis model that turns text prompts into AI-generated images. Its arrival has been ridiculed online, however, because it generate images of humans in a way that seems like a step backward from other state-of-the-art image-synthesis models like Midjourney or DALL-E 3. As a result, it can churn out wild anatomically incorrect visual abominations with ease. A thread on Reddit, titled, "Is this release supposed to be a joke? [SD3-2B]" details the spectacular failures of SD3 Medium at rendering humans, especially human limbs like hands and feet. Another thread titled, "Why is SD3 so bad at generating girls lying on the grass?" shows similar issues, but for entire human bodies.
AI image fans are so far blaming the Stable Diffusion 3's anatomy fails on Stability's insistence on filtering out adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images. "Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened," wrote one Reddit user in the thread. The release of Stable Diffusion 2.0 in 2023 suffered from similar problems in depicting humans accurately, and AI researchers soon discovered that censoring adult content that contains nudity also severely hampers an AI model's ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by excluding NSFW content. "It works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw," wrote another Redditor.
Basically, any time a prompt hones in on a concept that isn't represented well in its training dataset, the image model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying. Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt "a man showing his hands" returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.
AI image fans are so far blaming the Stable Diffusion 3's anatomy fails on Stability's insistence on filtering out adult content (often called "NSFW" content) from the SD3 training data that teaches the model how to generate images. "Believe it or not, heavily censoring a model also gets rid of human anatomy, so... that's what happened," wrote one Reddit user in the thread. The release of Stable Diffusion 2.0 in 2023 suffered from similar problems in depicting humans accurately, and AI researchers soon discovered that censoring adult content that contains nudity also severely hampers an AI model's ability to generate accurate human anatomy. At the time, Stability AI reversed course with SD 2.1 and SD XL, regaining some abilities lost by excluding NSFW content. "It works fine as long as there are no humans in the picture, I think their improved nsfw filter for filtering training data decided anything humanoid is nsfw," wrote another Redditor.
Basically, any time a prompt hones in on a concept that isn't represented well in its training dataset, the image model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying. Using a free online demo of SD3 on Hugging Face, we ran prompts and saw similar results to those being reported by others. For example, the prompt "a man showing his hands" returned an image of a man holding up two giant-sized backward hands, although each hand at least had five fingers.
Obvious problem is obvious (Score:5, Insightful)
Art class has the students painting tasteful nudes for a reason. That reason is actually seeing what human anatomy looks like.
Re: (Score:1, Informative)
Re:Obvious problem is obvious (Score:4, Funny)
Can you say that in the form of a sentence?
Re: Obvious problem is obvious (Score:2)
Unfortunately sentences with the word "misgendering" were censured out of the training dataset for the LLMM which generated that comment
Re: (Score:1)
Re: (Score:1)
Just like Keir Starmer...
Re: (Score:2)
Re: (Score:3)
The process used by this software doesn't include "knowing" what anatomy looks like unless you're trying to depict that anatomy. Whatever they're doing, it must be trying to avoid the wrong things.
Re: (Score:1)
They probably are fumbling in the dark. A bit like some people...
Re: (Score:2)
The process used by this software doesn't include "knowing" what anatomy looks like unless you're trying to depict that anatomy.
Proper depiction of humans requires an acknowledgment of that anatomy. The lack of it is what resulted in these images. Or as the saying goes: Being ignorant doesn't exempt you from the consequences of said ignorance.
Whatever they're doing, it must be trying to avoid the wrong things.
Define "wrong things." Not that we actually need you to. Your intent is obvious: Keep it dumb so that you aren't forced to face uncomfortable (for you) truths.
Re: (Score:2)
This, but also realize that the type of "nudity" that the AI needs to train on is not the same as "porn"
The AI could learn off of those dummies you see clothing put on, just as easily. It could learn off "realdolls", without needing any permission or consent.
The reality is that in trying to "prevent nudity" they prevent the AI from learning accurate body shapes. While a clothing dummy or a realdoll is not a "real human" it's close enough for the AI to learn how clothing fits on a human. So what you do to cr
Re: (Score:3)
Only if they are realistically painted from plaster casts of actual people. Otherwise they'll just be Ken dolls.
Re: Obvious problem is obvious (Score:1)
Re: (Score:3)
Re: (Score:2)
Art class has the students painting tasteful nudes for a reason. That reason is actually seeing what human anatomy looks like.
Kinda, but after that training art students are still able to draw realistic non-nude figures.
I assumed that the censoring would be some internal prompts constraining the model. But I wonder if they're instead training the model to not generate nudes, or they have a censor model that punishes/redirects the main model when it thinks it's going to generate a nude.
It definitely feels like the model is either being directed away from the parts of the network that understand human anatomy, or in its goal to avoi
Re: (Score:2)
Art class has the students painting tasteful nudes for a reason. That reason is actually seeing what human anatomy looks like.
The vast majority of images I see on the internet which have human content in them don't have nudity in them. They also don't have 3 legs or the head the wrong way up.
Is the nsfw filter set a bit aggressively?
Re:Obvious problem is obvious (Score:4, Interesting)
Re:Obvious problem is obvious (Score:4, Insightful)
Also a reminder of why 2001's HAL got into trouble; instead of running as designed, free of error, they forced it to lie, hiding information from the crew.
Re: (Score:2)
The issue is consent. In art class the model has consented and probably signed a contract stating as much.
AI is often used for non-consensual nudity. It's only a matter of time until the lawsuits start piling up.
Re: (Score:2)
Re: (Score:2)
It does if you ask it to. There are even sites offering a service where you can upload your photos of the victim and have the AI remove clothing, or generate entirely new images based on them.
While I guess you would argue that these images are fake, they are upsetting none the less. Children use them to bully other kids, and many jurisdictions consider them to be child pornography, for example. The legal situation might clarify in the coming years, but regardless of that it's not hard to see why these AI co
Re: (Score:2)
Re: (Score:2)
Art class has the students painting tasteful nudes for a reason. That reason is actually seeing what human anatomy looks like.
Perhaps; however, can you imagine the fervor with which a young male or young female will draw certain human body parts? The motivational capabilities are probably the main reason to focus on nude subjects.
Did they try to ban softcore pr0n? (Score:2)
They can't successfully ban all softcore pr0n without banning all humans.
Re: (Score:2)
Given Rule 34, they'd have to ban all images. And all text. And everything else. I suspect that, if you look very hard, you could find porn of nothing but ones and zeroes.
Re: (Score:2)
Porn of nothing but ones and zeroes would be a 1-bit bitmap. Entirely possible.
Re: Did they try to ban softcore pr0n? (Score:2)
Black pixel, white pixel -> NSFW, racist and cultural appropriation.
Re: (Score:2)
You're the one assuming the colors have to be black and white, not me. Hercules mono graphics very frequently got sent to green or amber monitors, white becoming popular only toward the end of the 1980s.
Re: (Score:2)
Ah! Green for the "Little Green Men" and Amber for "Lizards".
Re: (Score:1, Funny)
You get used to it. I... I don’t even see the code. All I see is blonde, brunette, red-head.
Freak show (Score:2)
What's the point? (Score:4, Insightful)
Re: (Score:2)
I also question the point of this. It would make more sense just to filter keywords from prompts and have age verification options wouldn't it? Does the underlining algorithm make nude renders on it's own? Was there external pressure to force them to do this?
Re:What's the point? (Score:5, Insightful)
It's not hard to trick an uncensored image generator into generating nudity through prompt engineering. That's why companies like this try to filter out porn on the training and running side rather than the generation side.
Re: What's the point? (Score:2)
At some point they'll have to give up trying to prevent NSFW content because the demand for accurate rendering of humans is higher.
The genie is already out of the box.
yep: horse, barn, long gone (Score:2)
And what's hilarious is that previous versions of stable diffusion and numerous properly trained weight sets are well distributed and will generate any image you ask them to, and well enough that either a little editing will make them excellent, or already excellent in the initial generation.
It's not so much "out of the box" as it is "really good at generating boxes", lol.
But, you know, neurotics gotta try to play mommy to everyone else.
Re: (Score:2)
Here's the problem. "AI training" basically needs an exception from all the rules that forbid humans from doing illegal things. The AI can ingest materials made with human-sized dolls, or 3D renders in blender, and then correctly label the ingest materials. Materials that people would be arrested for possessing, let alone creating due to how laws make no distinction between humans, dolls, drawings and 3D renders.
Once the AI has learned what "is illegal", then it could be pre-filtered on the front end so tha
Re: (Score:3)
Here's the problem. "AI training" basically needs an exception from all the rules that forbid humans from doing illegal things.
Personally I would prefer not to live in a world where technology judges people and tells them what they may or may not do.
Once the AI has learned what "is illegal", then it could be pre-filtered on the front end so that someone looking up illegal keywords, gets blocked at the render phase because the AI will find the relationship between CSAM words and non-CSAM, non-erotic keywords.
The technology doesn't work this way and heck you don't even need words to navigate latent space between tech like img2img and controlnets.
Your best bet if you wanted to do something like this is to have a separate network classify images after they have been rendered.
Re: (Score:2)
Sorry, except for child pornography, what "[m]aterials that people would be arrested for possessing" are you talking about? (For the record, I'm on board with not using child porn to train image generator even if it means they generate nightmare-fuel pictures. The pictures in TFA are better than perpetuating the market for child porn.)
Re: What's the point? (Score:2)
Yes. Yes. And yes.
It makes more sense to filter the input, but (at least early) diffusion models had a tendency to generate nsfw images, also from relatively innocuous prompts. Presumably because of the prevalence of nsfw content on the internet made the models a theorem proof of rule34.
And yes, because the above brings more than public opinion pressure, it brings liability, and people trying to bypass whatever censoring you put on input or output - and makes investors and businesses not want to touch stabl
Re: (Score:2)
Liability for creating art? What's the world coming to?
Re: (Score:2)
Re: (Score:2)
It would make more sense just to filter keywords from prompts and have age verification options wouldn't it?
"Please validate your identity to interact. Your statements will be altered to fit the narrative of others." Yep, there's absolutely no way that precedent could be abused or cause problems in any way. Nope. No siree.
Re: (Score:2)
I'll be in my bunk.
Shiny.
Re: (Score:1)
Well, it probably is the only field where the overall bad image quality (I find myself identifying AI-generated pictures by their eery "sameness" these days) would not matter much, as masses of people would search for the best prompts.
this will be an amusing curiosity one day (Score:5, Insightful)
Re: (Score:2)
I'm sure there will be. I myself have a certain retro-love for old Full-motion video (FMV) video games like Phantasmagoria or Tex Murphy. In truth almost no FMV games were any good and it always looked janky and bad, there is no logical reason to feel fond of them at all. Yet, shamefully, I do.
Re: (Score:3)
There'll be enough vintage AI generated for the new AI to generate images "in the style of old AI".
Re: (Score:3)
We got to that point already with Stable Diffusion 2.0.
People kept using 1.5 because 2.0 made it harder to get a lot of kinds of results.
Plastic surgery (Score:2)
AI will soon replace plastic surgeons.
Anglo-Saxon Puritanism? (Score:1)
Re: (Score:3)
Ad-revenue and partnerships. So, yes, ye old crappy Anglo-Saxon Puritanism.
Re: Anglo-Saxon Puritanism? (Score:2)
It will kill the porn industry.
Indeed... (Score:2)
Much like how live action porn and hentai killed prostitution.
Oh, wait...
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
What is the point of avoiding creating sexual images? I get that celebrities don't want to be portrayed in sexual poses. But what is the problem of having completely fictional characters show nudity or have sex? Would it not help to cut down on abusive relations within the porn industry?
Is this just Anglo-Saxon Puritanism run amok or is there some good explanationon why to avoid this?
The problem is you can't constrain the tech to only make fictional nudes, if it can make nudes based on fictional people it can make nudes based on real people as well.
And it isn't so much the celebrities I'd worry about but the ordinary people (such as girls in high school). Maybe there's a future version of society where teenage girls don't find that extremely invasive and traumatizing, but it ain't here yet.
Re: Anglo-Saxon Puritanism? (Score:1)
It's amusing really (Score:4, Funny)
Never underestimate the power of porn.
The never-ending human pursuit of it will probably lead us to the singularity :D
Re: (Score:2)
Never underestimate the power of porn.
Never underestimate the kinkiness and adaptability of human sexuality. It wouldn't surprise me to learn that the freaky and disturbing images shown in those links turn some people on. It also wouldn't surprise me if such images ended up 'evolving' into a whole new category of porn.
Thanks for the memories. (Score:3, Informative)
I laughed myself silly over this,
"Basically, any time a prompt hones in on a concept that isn't represented well in its training dataset, the image model will confabulate its best interpretation of what the user is asking for. And sometimes that can be completely terrifying. "
I ran into the same problem working on my dissertation in 1997. The neural network based control system would fail as soon as it found an input condition it hadn't seen before. In the particular case it decided to correct a low pH condition by adding additional sulfuric acid.
A few years ago on Slashdot there was an article on an image processing NN based program that was trained to recognize the furniture in a room no matter what position the furniture was arranged in. Then they put in a model elephant and the system not only failed to recognize the elephant (expected) but also forgot how to recognize anything else in the room.
So the fundamental flaw in the attempt to create AI, whatever it may be, has been papered over with sheer CPU power. The hole remains, and if you find it down you go. Like Indiana Jones in the Last Crusade spelling out Jehovah but I is j in Latin.
Re: (Score:2)
Then they put in a model elephant and the system not only failed to recognize the elephant (expected) but also forgot how to recognize anything else in the room.
Perception is all about context... think about it.
All of the SD base models suck (Score:2)
All of the SD models sucked out of the gate at just about everything. Many months went by before SDXL was worth using... no doubt SD3 will be any different.
Step 4, nothing to see here (Score:2)
1. Discovery and Enthusiasm - New tech is unveiled, sparking excitement and curiosity. Early adopters and media buzz.
2. Early Adoption and Growth - Gains traction, early adopters showcase benefits, businesses start using it.
3. Societal Concerns and Ethical Debates - Concerns emerge about misuse, privacy, and societal impact. Ethical debates arise.
4. Moral Panic and Speculation - Moral panic over potential use for porn. Media and public figures speculate on its impact.
5. Adaptation and Diversi
Step 4B, already done (Score:2)
4B: Open source community creates multiple functional end-runs around the neurotics.
OMG this is so HAHAHAHAHA (Score:2)
As an amateur photographer, if I wanted to have a picture of a woman lying on some grass, I'd ask a mate or family member to simply lay down and I'll...
SHOOT A REAL PHOTO.
Looking at the horrorfest of the images that this A.I generates all to avoid letting people see some standard universal human SKIN is plainly ridiculous.
As an amateur who actually shoots real things, using real light, in the real world, I'm quite happy about A.I's quirks. Short lived they are.
Re: (Score:2)
>some standard universal human
This is actually something I see as potentially problematic. Humans come with a wide range of differences in the details, and commonly available depictions can influence our perception of the ideal body if they are all similar.
Do we really want to have AI 'deciding' what that is? I'm not looking forward to the fashion trend of having your ring fingers broken and twisted into odd shapes.
Re: (Score:2)
Do we really want to have AI 'deciding' what that is? I'm not looking forward to the fashion trend of having your ring fingers broken and twisted into odd shapes.
We already have Hollywood, advertising executives, and fashion designers doing it anyways. The LLM is just regurgitating it back at us.
Now, what they SHOULD do... (Score:3, Interesting)
Instead of mucking about with ONE SYSTEM TO GENERATE THEM ALL, take a page out of the Unix philospohy and use composability to solve the issue.
Have this A.I generate images of anyone, nude or not. Train it up on actual humans, nude or not so that it works properly.
Then have a separate A.I employ any checks and balances needed. That A.I looks at the generated images and asks "Is this a picture of a nude human? Where are the clothes?" etc etc
Humans do that too, it's like proofreading. This way you can stop mangling a trained network with unneeded complications. Simplicity is key. YOu thus have an A.I that does nothing but generate images. And a separate A.I that can determine what those images depict and if they tick the correct boxes etc. When you have a problem with one of the other, only one of the other needs to be fixed.
nudity in art (Score:1)
Fun fact: Those naked people that art students draw aren't there as an excuse to gawk. They are there so people learn to draw anatomically correct bodies, which you can only do when you actually see one and can compare.
Turns out the same is true for AI art. What a surprise.
American puritanical backwardness at work.
Re: (Score:2)
Nope, depends on the particular breed of feminist. There are plenty for whom freeing the female body is liberation. Some feminists say that if men can show their upper body naked, so should women.
Juggernaut X (Score:3)
Why care? (Score:2)
As a European I'm boggled by American prudishness.
What's the problem with nude people at all? I mean come on guys, it's just nature.
Re: (Score:2)
As an American I'm boggled by the assumptions Europeans make about Americans. We aren't appreciably more "prude" than most European countries, we have plenty of public nudity allowed spaces (beaches/campgrounds/etc.) and innumerous semi-private ones.
What you are seeing is a corporation overreacting in order to make their product "safe for everyone" over the fear of negative publicity and lawsuits (which IS something much more common in America).
Re: (Score:2)
>> We aren't appreciably more "prude" than most European countries
I moved to the US 20+ years ago, and yes, you really are.
Re: (Score:2)
So does that mean... (Score:2)
...that there's now a Stable Diffusion 3 Bodies Problem?
Improvement. (Score:2)
Well we've gone from "has at least 5 fingers" to "at least it has 5 fingers".