Yuli-Ban t1_ir6yhfq wrote
Reply to comment by MrWilsonLor in Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos by Dr_Singularity
I say not enough. I want to end the year with even this seeming obsolete and weak.
Lone-Pine t1_ir78nqt wrote
When you do think we'll see long-term coherence?
Yuli-Ban t1_ir7950h wrote
We've already seen it.
https://plai.cs.ubc.ca/2022/05/20/flexible-diffusion-modeling-of-long-videos/
> Dr. Wood says “This is simply the most impressive AI result I have personally seen in my career. Long range coherence is a challenge even for modern language models with massive parameter counts. Will, Saeid, Vaden, and Christian have taken a huge step forward by being able to stably generate coherent, photo-realistic 1hour+ long videos; 70x’s longer than their longest training video, and more than 2000x’s longer than the maximum of 20 frames they ever look at at once during training. There is something very special in the training procedure they have developed and the architecture they employ. Never have we been closer to being able to formulate AI agents that plan visually in domains with life-like complexity.”
Wassux t1_ir7ry2s wrote
Did they just invent short term memory?
Lone-Pine t1_ir7c03e wrote
Wow. What do you think will be next then?
420BigDawg_ t1_ir89w81 wrote
text to sound is next. They are already working on it.
thislifeiffullofcare t1_ir8c50r wrote
its already happened, hasn't it? There are ai music generators
Viewing a single comment thread. View all comments