Yuli-Ban t1_ir7950h wrote on October 5, 2022 at 9:03 PM

Reply to comment by Lone-Pine in Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos by Dr_Singularity

We've already seen it.

https://plai.cs.ubc.ca/2022/05/20/flexible-diffusion-modeling-of-long-videos/

> Dr. Wood says “This is simply the most impressive AI result I have personally seen in my career. Long range coherence is a challenge even for modern language models with massive parameter counts. Will, Saeid, Vaden, and Christian have taken a huge step forward by being able to stably generate coherent, photo-realistic 1hour+ long videos; 70x’s longer than their longest training video, and more than 2000x’s longer than the maximum of 20 frames they ever look at at once during training. There is something very special in the training procedure they have developed and the architecture they employ. Never have we been closer to being able to formulate AI agents that plan visually in domains with life-like complexity.”

Wassux t1_ir7ry2s wrote on October 5, 2022 at 11:19 PM

Did they just invent short term memory?

Lone-Pine t1_ir7c03e wrote on October 5, 2022 at 9:23 PM

Wow. What do you think will be next then?

420BigDawg_ t1_ir89w81 wrote on October 6, 2022 at 1:46 AM

text to sound is next. They are already working on it.

thislifeiffullofcare t1_ir8c50r wrote on October 6, 2022 at 2:05 AM

its already happened, hasn't it? There are ai music generators