Viewing a single comment thread. View all comments

MysteryInc152 t1_jd3v3kp wrote

There are foundation models that do these kinds of things. You can connect them to a language model to get the kind of effect you're thinking about.

Visual chatGPT - https://www.reddit.com/r/MachineLearning/comments/11mlwty/r_visual_chatgpt_talking_drawing_and_editing_with/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=share_button

3