Submitted by ReadSeparate t3_10zcig2 in singularity
We’re all waiting for the day that a GPT-3 scale model is released which integrates text, video, images, and audio. We’ve seen some progress on this front - namely Gato. But nothing that has really wow’ed us yet like ChatGPT or LaMDA. PaLM is really the only exception to this rule, but it was images and text only.
I think we all know this is coming soon, I’m wondering if anyone here is aware of any indications of this actively being worked on, or has any predictions for release dates. Especially for a video model.
A model which can take any combination of video, audio, image, and text tokens as input and output would most likely be very, very remarkable, making ChatGPT look like a toy in comparison.
adt t1_j831ml0 wrote
There is an entire world outside of California...
Germany: Luminous 200B multimodal.
China: All of the ERNIE 260B cross-modal stuff.
^(Yeh, you need) ^(The Memo)^(!)