Viewing a single comment thread. View all comments

red75prime t1_itk9f2j wrote

> (Q4 2028) An average to low end computer or cheap subscription service is capable of generating high resolution and frame rate videos spanning several minutes.

If it will take days to render them, then maybe.

AIs don't yet significantly feed back into design and physical construction of the chip fabrication plants, so by 2028 we'll have one or two 2nm fabs and the majority of new consumer CPUs and GPUs will be using 3-5nm technology. Hardware costs will not drop significantly too (fabs are costly), so 2028 low-end will be around today's high-end performance-wise (with less RAM and storage).

Anyway, I would shift perfect long-term temporal consistency to 2026-2032 as it depends on integrating working and long-term memory into existing AI architectures and there's yet no clear path to that.

1

Redvolition t1_itlmbug wrote

Have you seen the Phenaki demo?

I am not an expert, but from what I am digesting from the papers coming out, you could get to this Q4 2028 scenario with just algorithm improvements, without any actual hardware upgrade.

1

red75prime t1_itlxjbf wrote

Phenaki has the same problem: limited span of temporal consistency that cannot be easily scaled up. If an object goes offscreen for some time the model forgets how it should look.

1

DEATH_STAR_EXTRACTOR t1_itoxcm2 wrote

But why is the first NUWA vr1 from 10 months ago only about 900M parameters and can do face prediction like shown etc and Imagen Video which is 11B parameters or so can do what it can do. I mean it doesn't look like Imagen Video is so much better. I know it can do words in leaves n all but I feel it can come out the same if given frame rate improvement and upscaling and more data/bigger brain. Yes there's a evaluation score but I'm talking about by eye.

1