ChronoPsyche t1_itftlss wrote
Reply to comment by No_Skin1273 in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude
I know. I said isn't out. As in its not publicly available yet. And it's very unsophisticated. Like I said.
No_Skin1273 t1_itftu89 wrote
you can already do movie with this even if it's not Netflix quality and if you call that not sophisticated text2image isn't sophisticated
ChronoPsyche t1_itfu7oo wrote
No you can't because this can only generate videos that are minutes long. A movie is by definition 90 minutes or longer. And we are clearly talking about coherent film productions, not something that spans the length of a movie.
If we are changing the definition to something that spans 90 minutes and is motion picture, but could include incoherent dribble, then sure, that will happen soon. In fact, you can already do that with batch processing. Nobody would call that a movie though.
No_Skin1273 t1_itfve2h wrote
where is your proof that it can't do more than 2 minutes for make a video... It's not because they didn't generate one that they can't do it... Even if compute intensive you could do a film with it.
ChronoPsyche t1_itg18gz wrote
>where is your proof that it can't do more than 2 minutes for make a video
....I read the actual research paper...that's how I know. Only one of them can do minutes. The other two can only do seconds at the moment.
For Imagen Video:
>Imagen Video scales from prior
>
>work of 64-frame 128×128 videos at 24 frames per second to 128 frame 1280×768 high-definition
video at 24 frames per second.
128 frames/24 frames per second is a 5 second video.
For Meta
>Given input text x translated by the prior P into
>
>an image embedding, and a desired frame rate f ps, the decoder Dt generates 16 64 × 64 frames,
>
>which are then interpolated to a higher frame rate by ↑F , and increased in resolution to 256 × 256
>
>by SRt
>
>l
>
>and 768 × 768 by SRh, resulting in a high-spatiotemporal-resolution generated video yˆ.
16 frames which they interpolate between to create a few second video.
And then Phenaki, which can generate the longest at a few minutes.
>Generate temporally coherent and diverse videos conditioned on open domain prompts even
>
>when the prompt is a new composition of concepts (Fig. 3). The videos can be long (minutes)
>
>even though the model is trained on 1.4 seconds videos (at 8 fps).
​
>Even if compute intensive you could do a film with it.
...You clearly have no clue what you are talking about. I would suggesting doing some reading on the current state of the tech and also read the actual research papers.
No_Skin1273 t1_itftww0 wrote
the type of silent movies that almost nobody would watch today
No_Skin1273 t1_itfuo4p wrote
also the researche paper IS OUT so... you know, but not opensource the difference is here. For opensource i will probably look at stabilityAI but i think it will be more compute intensive so this will probably and up with something you are gonna need a subscription for.
Viewing a single comment thread. View all comments