NTIASAAHMLGTTUD t1_j8c0cj3 wrote
In the interest of skepticism, can anyone pour any cold water on this or is it as good as it sounds?
turnip_burrito t1_j8c1932 wrote
It's only tested on one benchmark, called ScienceQA. Maybe testing it on others would allow us to how well it really stacks up.
el_chaquiste t1_j8c1z83 wrote
If I understand well, seems the input set (a science exam with solved exercises and detailed responses) is smaller than GPT3.5's own, but it overperforms GPT3.5 and humans on solving problems similar to those from said exam by some percent, more if it has a multimodal training including visual data.
I honestly don't know if we should get overly excited over this or not, but it seems like it would allow the creation of smaller models focused on some scientific and technical domains, with better accuracy in their reponses than generalist LLMs.
[deleted] t1_j8c34us wrote
[deleted]
SoylentRox t1_j8cblun wrote
Theoretically it should query a large number of models, and have a "confidence" based on how likely each model's answer is to be correct. Then return the most confidence answer.
duboispourlhiver t1_j8cpldd wrote
Artificial expert panel
RabidHexley t1_j8dzxsv wrote
I Am Legion
ReadSeparate t1_j8fb4cr wrote
One can easily imagine a generalist LLM outputting an action token which represents prompting the specialized LLM, which then gets routed to the specialized LLM, then the response is formatted and put into context by the generalist.
[deleted] t1_j8g761n wrote
[deleted]
Ishynethetruth t1_j8c6kn8 wrote
Unless it’s released to the public don’t trust Amazon
Cryptizard t1_j8d86dt wrote
The humans they tested on were random people on Mechanical Turk, so that data point is not very illuminating.
Borrowedshorts t1_j8erk8h wrote
It's as good as it sounds, and you can't really fake performance on a dataset such as this. Multimodal models will change the game. I don't think multimodal models by themselves are the end game, but they appear to be poised to takeover state of the art performance for the foreseeable future.
Viewing a single comment thread. View all comments