ChronoPsyche t1_itftlss wrote on October 23, 2022 at 9:43 AM

Reply to comment by No_Skin1273 in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude

I know. I said isn't out. As in its not publicly available yet. And it's very unsophisticated. Like I said.

No_Skin1273 t1_itftu89 wrote on October 23, 2022 at 9:46 AM

you can already do movie with this even if it's not Netflix quality and if you call that not sophisticated text2image isn't sophisticated

ChronoPsyche t1_itfu7oo wrote on October 23, 2022 at 9:51 AM

No you can't because this can only generate videos that are minutes long. A movie is by definition 90 minutes or longer. And we are clearly talking about coherent film productions, not something that spans the length of a movie.

If we are changing the definition to something that spans 90 minutes and is motion picture, but could include incoherent dribble, then sure, that will happen soon. In fact, you can already do that with batch processing. Nobody would call that a movie though.

No_Skin1273 t1_itfve2h wrote on October 23, 2022 at 10:07 AM

where is your proof that it can't do more than 2 minutes for make a video... It's not because they didn't generate one that they can't do it... Even if compute intensive you could do a film with it.

ChronoPsyche t1_itg18gz wrote on October 23, 2022 at 11:22 AM

>where is your proof that it can't do more than 2 minutes for make a video

....I read the actual research paper...that's how I know. Only one of them can do minutes. The other two can only do seconds at the moment.

For Imagen Video:

>Imagen Video scales from prior
>
>work of 64-frame 128×128 videos at 24 frames per second to 128 frame 1280×768 high-definition

video at 24 frames per second.

128 frames/24 frames per second is a 5 second video.

For Meta

>Given input text x translated by the prior P into
>
>an image embedding, and a desired frame rate f ps, the decoder Dt generates 16 64 × 64 frames,
>
>which are then interpolated to a higher frame rate by ↑F , and increased in resolution to 256 × 256
>
>by SRt
>
>l
>
>and 768 × 768 by SRh, resulting in a high-spatiotemporal-resolution generated video yˆ.

16 frames which they interpolate between to create a few second video.

And then Phenaki, which can generate the longest at a few minutes.

>Generate temporally coherent and diverse videos conditioned on open domain prompts even
>
>when the prompt is a new composition of concepts (Fig. 3). The videos can be long (minutes)
>
>even though the model is trained on 1.4 seconds videos (at 8 fps).

>Even if compute intensive you could do a film with it.

...You clearly have no clue what you are talking about. I would suggesting doing some reading on the current state of the tech and also read the actual research papers.

No_Skin1273 t1_itftww0 wrote on October 23, 2022 at 9:47 AM

the type of silent movies that almost nobody would watch today

No_Skin1273 t1_itfuo4p wrote on October 23, 2022 at 9:58 AM

also the researche paper IS OUT so... you know, but not opensource the difference is here. For opensource i will probably look at stabilityAI but i think it will be more compute intensive so this will probably and up with something you are gonna need a subscription for.