Right now, text2image models are generating impressive output but I see two major issues with their use at the moment.

The first issue is the required dataset size. If you take a human and show them 4 or 5 images of an animal they've never seen before they'll generally be able to draw it quite well. but stable diffusion needs far more images to train on.

But the second and more prevalent issue is "prompt engineering". every damn portrait shot has some weird garbage about "4k ultra high definition high fidelity great lighting award winning beautiful stunning photo" or some bullshit. this suggests to me that we are still not communicating very well with the model. hell, negative prompts show this as well. some people have super long negative prompts full of things like "6 toes" and "deformed face". so the AI still needs to be told explicitly "don't draw a person with a deformed face"

what do you see as the solution to these issues going forward? will it be a combination of models like chatgpt and models like stable diffusion, so you can talk the text2image model through what you want in a more natural way?

Comments

You must log in or register to comment.

starstruckmon t1_j6l1k5l wrote on January 31, 2023 at 2:00 AM

>take a human and show them 4 or 5 images of an animal they've never seen before they'll generally be able to draw it quite well

4-5 is actually enough to fine tune a pretrained SD model. Which is the correct comparison since we're already pretrained. Even if you ignore all the data upto that point in your life, even newborn brains are pretrained by evolution. They aren't initialised from random weights. Easier to notice this in other animals that can start walking right after birth.

Roggann t1_j6jp2yi wrote on January 30, 2023 at 8:32 PM

In an interview, Sam Altman (CEO of OpenAI) said the input would be probably evolving from prompt engineering to a more natural language.

See it like an iterative conversation with the system, giving it constant feedback using no (or very little) amount of those magic keywords that are necessary for the model to work right now.

im-so-stupid-lol OP t1_j6jtw0w wrote on January 30, 2023 at 9:02 PM

would be interested to see this interview

Roggann t1_j6jz6s8 wrote on January 30, 2023 at 9:35 PM

Yes, sorry for not sharing it.

Very interesting to watch the whole thing.

Baturinsky t1_j6mma6w wrote on January 31, 2023 at 12:14 PM

You have to use all that "weird garbage" in the prompt for the default Stable Diffusion checkpoints, because they have all kind of junk jammered into the learning set, so you have to sort it out.

Many third party checkpoints, such as NovelAI, AnythingV3 or PFG work well with small prompts.

Example: https://www.reddit.com/r/StableDiffusion/comments/zzdxug/pfg_model_syd_mead/ with prompt being just "drama movie by Syd Mead, female royal guard, detailed face"

Baturinsky t1_j6mmb1s wrote on January 31, 2023 at 12:15 PM

You have to use all that "weird garbage" in the prompt for the default Stable Diffusion checkpoints, because they have all kind of junk jammered into the learning set, so you have to sort it out.

Many third party checkpoints, such as NovelAI, AnythingV3 or PFG work well with small prompts.

Example: https://www.reddit.com/r/StableDiffusion/comments/zzdxug/pfg_model_syd_mead/ with prompt being just "drama movie by Syd Mead, female royal guard, detailed face"

im-so-stupid-lol OP t1_j6nfeff wrote on January 31, 2023 at 3:59 PM

can I try out that model online on that website if I make an account?

Baturinsky t1_j6nnq4k wrote on January 31, 2023 at 4:51 PM

I dunno. But if you have at least 1060gtx you can install and run Stable Diffusion locally.

fluffy_assassins t1_j6jh00j wrote on January 30, 2023 at 7:42 PM

People will have to develop the skill of composing the proper prompts to get what they really want.

im-so-stupid-lol OP t1_j6jhizv wrote on January 30, 2023 at 7:46 PM

that seems to be the least likely outcome given the intense competition in the space. it's a problem that's having billions thrown at it.

fluffy_assassins t1_j6jhnxa wrote on January 30, 2023 at 7:46 PM

Well that's the solution until there's a better solution.

im-so-stupid-lol OP t1_j6jhw3t wrote on January 30, 2023 at 7:48 PM

right but my OP was about what it will be like in the future:

> what do you see as the solution to these issues going forward?

fluffy_assassins t1_j6jlhen wrote on January 30, 2023 at 8:10 PM

I think the specifics and information have to be provided somehow. I saw a utility someone made that uses checkmarks, that could help. Neuralink might allow this to more directly provide parameters.

im-so-stupid-lol OP t1_j6jto1q wrote on January 30, 2023 at 9:01 PM

yeah certainly the data has to be transferred somehow, the machine cannot read your mind, but I think being able to have a "conversation" effectively with the algorithm will go a long way. for example:

"portrait of woman, age 25, beautiful face"

"okay, face is good but make the photo more realistic and less artsy, and have her wearing a black turtle neck"

"okay, now let's work on the background, make it lighter and blurred slightly"

et cetera

Southern_Orange3744 t1_j6l0oe7 wrote on January 31, 2023 at 1:53 AM

So product managers all over again ?