Submitted by im-so-stupid-lol t3_10pa18k in singularity

Right now, text2image models are generating impressive output but I see two major issues with their use at the moment.

The first issue is the required dataset size. If you take a human and show them 4 or 5 images of an animal they've never seen before they'll generally be able to draw it quite well. but stable diffusion needs far more images to train on.

But the second and more prevalent issue is "prompt engineering". every damn portrait shot has some weird garbage about "4k ultra high definition high fidelity great lighting award winning beautiful stunning photo" or some bullshit. this suggests to me that we are still not communicating very well with the model. hell, negative prompts show this as well. some people have super long negative prompts full of things like "6 toes" and "deformed face". so the AI still needs to be told explicitly "don't draw a person with a deformed face"

what do you see as the solution to these issues going forward? will it be a combination of models like chatgpt and models like stable diffusion, so you can talk the text2image model through what you want in a more natural way?

4

Comments

You must log in or register to comment.

starstruckmon t1_j6l1k5l wrote

>take a human and show them 4 or 5 images of an animal they've never seen before they'll generally be able to draw it quite well

4-5 is actually enough to fine tune a pretrained SD model. Which is the correct comparison since we're already pretrained. Even if you ignore all the data upto that point in your life, even newborn brains are pretrained by evolution. They aren't initialised from random weights. Easier to notice this in other animals that can start walking right after birth.

7

Roggann t1_j6jp2yi wrote

In an interview, Sam Altman (CEO of OpenAI) said the input would be probably evolving from prompt engineering to a more natural language.

See it like an iterative conversation with the system, giving it constant feedback using no (or very little) amount of those magic keywords that are necessary for the model to work right now.

5

Baturinsky t1_j6mma6w wrote

You have to use all that "weird garbage" in the prompt for the default Stable Diffusion checkpoints, because they have all kind of junk jammered into the learning set, so you have to sort it out.

Many third party checkpoints, such as NovelAI, AnythingV3 or PFG work well with small prompts.

Example: https://www.reddit.com/r/StableDiffusion/comments/zzdxug/pfg_model_syd_mead/ with prompt being just "drama movie by Syd Mead, female royal guard, detailed face"

2

Baturinsky t1_j6mmb1s wrote

You have to use all that "weird garbage" in the prompt for the default Stable Diffusion checkpoints, because they have all kind of junk jammered into the learning set, so you have to sort it out.

Many third party checkpoints, such as NovelAI, AnythingV3 or PFG work well with small prompts.

Example: https://www.reddit.com/r/StableDiffusion/comments/zzdxug/pfg_model_syd_mead/ with prompt being just "drama movie by Syd Mead, female royal guard, detailed face"

1

im-so-stupid-lol OP t1_j6nfeff wrote

can I try out that model online on that website if I make an account?

1

Baturinsky t1_j6nnq4k wrote

I dunno. But if you have at least 1060gtx you can install and run Stable Diffusion locally.

1

fluffy_assassins t1_j6jh00j wrote

People will have to develop the skill of composing the proper prompts to get what they really want.

0

im-so-stupid-lol OP t1_j6jhizv wrote

that seems to be the least likely outcome given the intense competition in the space. it's a problem that's having billions thrown at it.

6

fluffy_assassins t1_j6jhnxa wrote

Well that's the solution until there's a better solution.

0

im-so-stupid-lol OP t1_j6jhw3t wrote

right but my OP was about what it will be like in the future:

> what do you see as the solution to these issues going forward?

3

fluffy_assassins t1_j6jlhen wrote

I think the specifics and information have to be provided somehow. I saw a utility someone made that uses checkmarks, that could help. Neuralink might allow this to more directly provide parameters.

1

im-so-stupid-lol OP t1_j6jto1q wrote

yeah certainly the data has to be transferred somehow, the machine cannot read your mind, but I think being able to have a "conversation" effectively with the algorithm will go a long way. for example:

"portrait of woman, age 25, beautiful face"

"okay, face is good but make the photo more realistic and less artsy, and have her wearing a black turtle neck"

"okay, now let's work on the background, make it lighter and blurred slightly"

et cetera

1