Submitted by Dr_Singularity t3_y0rij3 in singularity
-ZeroRelevance- t1_irvla8f wrote
Reply to comment by kasiotuo in Generation of high fidelity videos from text using Imagen Video by Dr_Singularity
That probably comes from the temporal upscaling. As they said, the initial video is only 3fps, so they’re basically synthesising 7 frames for each actual frame given. It’s no wonder it’s going to morph. If it began with a higher temporal resolution (initial fps), then it would likely be much more coherent.
Viewing a single comment thread. View all comments