turnip_burrito t1_j8c1932 wrote on February 13, 2023 at 4:31 AM

It's only tested on one benchmark, called ScienceQA. Maybe testing it on others would allow us to how well it really stacks up.

el_chaquiste t1_j8c1z83 wrote on February 13, 2023 at 4:38 AM

If I understand well, seems the input set (a science exam with solved exercises and detailed responses) is smaller than GPT3.5's own, but it overperforms GPT3.5 and humans on solving problems similar to those from said exam by some percent, more if it has a multimodal training including visual data.

I honestly don't know if we should get overly excited over this or not, but it seems like it would allow the creation of smaller models focused on some scientific and technical domains, with better accuracy in their reponses than generalist LLMs.

[deleted] t1_j8c34us wrote on February 13, 2023 at 4:48 AM

[deleted]

SoylentRox t1_j8cblun wrote on February 13, 2023 at 6:17 AM

Theoretically it should query a large number of models, and have a "confidence" based on how likely each model's answer is to be correct. Then return the most confidence answer.

duboispourlhiver t1_j8cpldd wrote on February 13, 2023 at 9:28 AM

Artificial expert panel

RabidHexley t1_j8dzxsv wrote on February 13, 2023 at 4:43 PM

I Am Legion

ReadSeparate t1_j8fb4cr wrote on February 13, 2023 at 9:55 PM

One can easily imagine a generalist LLM outputting an action token which represents prompting the specialized LLM, which then gets routed to the specialized LLM, then the response is formatted and put into context by the generalist.

[deleted] t1_j8g761n wrote on February 14, 2023 at 1:47 AM

[deleted]

Ishynethetruth t1_j8c6kn8 wrote on February 13, 2023 at 5:22 AM

Unless it’s released to the public don’t trust Amazon

Cryptizard t1_j8d86dt wrote on February 13, 2023 at 1:19 PM

The humans they tested on were random people on Mechanical Turk, so that data point is not very illuminating.

Borrowedshorts t1_j8erk8h wrote on February 13, 2023 at 7:49 PM

It's as good as it sounds, and you can't really fake performance on a dataset such as this. Multimodal models will change the game. I don't think multimodal models by themselves are the end game, but they appear to be poised to takeover state of the art performance for the foreseeable future.

This is Revolutionary?! Amazon's 738 Million(!!!) parameter's model outpreforms humans on sience, vision, language and much more tasks.

NTIASAAHMLGTTUD t1_j8c0cj3 wrote on February 13, 2023 at 4:23 AM