Viewing a single comment thread. View all comments

TFenrir t1_j3gzyhm wrote

It's just the current state of the video generating models that exist. First - the best of the best are at Google, and we've seen what they currently can do. Even if Stability has been able to spend the last few months replicating the research out of Google, I can't imagine them being able to create a model that can output more than 1 minute of somewhat coherent video. The current large challenge is the inefficiency of these models, the longer the context the MUCH larger the memory and processing power required.

These are problems I would be very surprised to be solved first anywhere other than Google.

What I imagine is more likely is a sort of StyleGAN system that can be applied on a whole video, with some level of coherence.

1