Viewing a single comment thread. View all comments

ReadSeparate t1_itgs97l wrote

There is one big assumption in this, and that's that we won't get ALL of those things out of scale alone. It's entirely possible someone builds a multi-modal model trained on text, video, and audio, and a text-to-movie generator is simply a secondary feature of such a model.

If this does happen, we could see it as soon as 2-5 years from now, in my opinion.

The one major breakthrough I DO think we need to see before text-to-movie is something to replace Transformers, as they aren't really capable of long term memory without hacks, and the hacks don't seem very good. You need long term memory to have a coherent movie.

I think it's pretty likely that everything else will be accomplished through scale and multi-modality.

16

red75prime t1_itk6c0n wrote

I'm sure that any practical AI system that will be able to generate movies will not do it all by itself. It will use external tools to not waste its memory and computational resources on mundane tasks of keeping exact 3d positions of objects and remembering all the intricacies of their textures and surface properties.

2