ReadSeparate t1_itgs97l wrote
Reply to comment by Zermelane in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude
There is one big assumption in this, and that's that we won't get ALL of those things out of scale alone. It's entirely possible someone builds a multi-modal model trained on text, video, and audio, and a text-to-movie generator is simply a secondary feature of such a model.
If this does happen, we could see it as soon as 2-5 years from now, in my opinion.
The one major breakthrough I DO think we need to see before text-to-movie is something to replace Transformers, as they aren't really capable of long term memory without hacks, and the hacks don't seem very good. You need long term memory to have a coherent movie.
I think it's pretty likely that everything else will be accomplished through scale and multi-modality.
red75prime t1_itk6c0n wrote
I'm sure that any practical AI system that will be able to generate movies will not do it all by itself. It will use external tools to not waste its memory and computational resources on mundane tasks of keeping exact 3d positions of objects and remembering all the intricacies of their textures and surface properties.
Viewing a single comment thread. View all comments