Viewing a single comment thread. View all comments

Ortus14 t1_jefkz2o wrote

LLM's like GPT-3.5 are intelligent from Language patterns alone.

Multimodal LLMs like GPT-4 that combine visual intelligence with LLMs are more intelligent.

Combining other modules may lead to greater intelligence.

Scaling singlemodal LLMs might get us to super intelligence eventually, but not as quickly as using multimodal models because those make greater effective use of available computation.

2

wowimsupergay OP t1_jefz9vi wrote

what I'm talking about is literally giving GPT eyes. ,right now it is multimodal because we can pass back RGB values and waveforms, in bytes (so text) .fundamentally though, GPT is not hearing or seeing anything. but I totally get what you're saying, and I do think multimodal intelligence .is the way to go.

also thank you for letting me know that multimodal intelligences use less computation per task, I did not know that. or rather, make better use of computation

1