Submitted by Erosis t3_xws0p1 in MachineLearning
ThePerson654321 t1_ir98sqw wrote
Reply to comment by master3243 in [R] Google announces Imagen Video, a model that generates videos from text by Erosis
I find it difficult to believe we will achieve the same video fidelity compared to image generation.
master3243 t1_ir9a5wt wrote
Image generation is by definition an easier task so the two will never catch up.
But do you not think that at some point in the future, video generation in the year 20XX will be better than image generation in 2022?
Even in the year 2050 or 2100?
ThePerson654321 t1_ir9adad wrote
Perhaps a few seconds but never a full movie.
tdgros t1_ir9hdy2 wrote
Phenaki already shows the generation of 2mn videos (using lots of prompts): https://phenaki.video/#interactive it's not that far fetched to imagine that working on longer prompts and videos...
master3243 t1_ir9bp3h wrote
What about a coherent 30 second silent clip from a short description that is as difficult to distenguish from real images as current SOTA image generation.
cleverestx t1_irbcdi0 wrote
Why not? I admit it IS more challenging, but video is only a series of images...
ThePerson654321 t1_irbck16 wrote
They said the same thing about nuclear fusion reactors.
cleverestx t1_irbcqd5 wrote
Those reactors are not a series of images.
wtf-hair-do t1_iraoudr wrote
they'll just never figure it out and give up
Viewing a single comment thread. View all comments