ThirdMover
ThirdMover t1_jdrzd7f wrote
Reply to comment by rya794 in [P] Using ChatGPT plugins with LLaMA by balthierwings
That depends on how well they will be able to keep their moat. There is a lot of hunger for running LLMs on your own - if not hardware than at least in software environments you control. People want to see what makes them tick rather than trust "Open"AIs black boxes.
Yeah they have a performance lead but time will tell how well they can stay ahead of the rest of the field trying to catch up.
ThirdMover t1_jdlabwm wrote
Reply to comment by MassiveIndependence8 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
What do you mean by "frame"? How many images do you think GPT-4 would need to get a cursor where it needs to go? I'd estimate four or five should be plenty.
ThirdMover t1_jdjf69i wrote
Reply to comment by plocco-tocco in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
I am not sure. Exactly how does inference scale with the complexity of the input? The output would be very short, just enough tokens for the "move cursor to" command.
ThirdMover t1_jdhvx8i wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
>GPT-4 is potentially missing a vital feature to take this one step further: Visual Grounding - the ability to say where inside an image a specific element is, e.g. if the model wants to click a button, what X,Y position on the screen does that translate to?
You could just ask it to move a cursor around until it's on the specified element. I'd be shocked if GPT-4 couldn't do that.
ThirdMover t1_jb0x91p wrote
Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
I think this is really exciting. LLM applications like ChatGPT seem to still mostly just pipe the result of the model sampling directly out but with 100 times faster inference, maybe complex chain of thought procedures with multiple differently prompted model instances (well, the same model but different contexts) can be chained and work together to improve their output while still running close to real time.
ThirdMover t1_j7899qa wrote
Reply to comment by PedroGonnet in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
You could also count water molecules.
ThirdMover t1_j77bf6z wrote
Reply to comment by yaosio in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
> I think it's likely the ability to determine what is true and what isn't will come from a capability of the model rather than it being told what is and isn't true. It's not possible to mark text as true or not true as this assumes whomever is mafking these things is the sole authority on the truth and never makes mistakes.
I think there is a bit of a misunderstanding here. The issue isn't that GPT3 has wrong opinions about stuff. The issue is that it doesn't have any opinions about what is real or isn't whatsoever. Of course any future AI will operate on limited and flawed information and thus have opinions that are not perfectly true. But before we can even get to that point a model needs to even have the idea of "real" and "not real" as fundamental categories. For GPT3 everything is just text, Harry Potter is as real as Obama. Maybe I am wrong and inference can actually get you there through pure consistency checks, as you say. But we will have to see about that.
ThirdMover t1_j760u5i wrote
Reply to comment by PedroGonnet in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Well, if you are at a billion the difference between continuous and discrete quantities becomes kind of hair splitting anyway....
ThirdMover t1_j760ojx wrote
Reply to comment by throwaway2676 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
I think it's going to be interesting if we manage to teach a model to actually have a notion of "factual" and "counterfactual" - right now every prompt is treated as equally valid, GPT3 doesn't have an "opinion" as to what is actually really true. I am not sure that is even possible with text (maybe with some sort of special marker token?) but multimodality might lead the way there.
ThirdMover t1_j4mi5iw wrote
Reply to comment by [deleted] in [D] Can ChatGPT flag it's own writings? by MrSpotgold
OpenAI stores the chat logs. That does not mean ChatGPT has any way to search through them.
ThirdMover t1_j46t3fc wrote
Reply to comment by hazard02 in [D] Bitter lesson 2.0? by Tea_Pearce
I think the point of the metaphor was Amazon stealing product ideas from third party vendors on their site and undercutting them. They know what sells better than anyone and can then just produce it.
If Google or OpenAI offers people the opportunity to finetune their foundation models they will know when something valuable comes out of it and simply replicate it then. There is close to zero institutional cost for them to do so.
That's a reason why I think all these startups that want to build business models around ChatGPT are insane: if you do it and it actually turns out to work OpenAI will just steal your lunch and you have no way of stopping that.
ThirdMover t1_jds1kid wrote
Reply to comment by rya794 in [P] Using ChatGPT plugins with LLaMA by balthierwings
The lead may not always be obvious and the trade off from transparency may be worth it. LLMs (or rather "foundation models") will continue to capture more and more areas of competence. If I want one that - for example - forms the front end chat bot to a store I have so that people can ask for product explanations, do I need then the 500 IQ GPT-7 that won two Nobel prizes last year?
I think it's most likely that there will always be black box huge models that form the peak of what is possible with machine intelligence but what people use and interact with in practice will simply be "good enough" smaller and open source models.