Viewing a single comment thread. View all comments

farmingvillein t1_ixdp4t8 wrote

Very neat! Would love to see a version built with fewer filters (secondary models)--i.e., more grounded in a singular, "base" model // less hand-tweaking--but otherwise very cool. (Although wouldn't surprise me if simply upgrading the model size went a long way here.)

11

Acceptable-Cress-374 t1_ixg6ngd wrote

Listened to a podcast with Andrej Karpathy recently, and his intuition for the future of LLM is that we'll see more collaboration and stacking of models, sort of a "council of GPT's" kind of approach, where you have models trained on particular tasks working together towards the goal.

Whatever the future holds, I'm betting we'll see constant improvements over the next few years, before we see a new revolutionary one-model take.

11

farmingvillein t1_ixgd88a wrote

Yeah, understood, but that wasn't really what was going on here (unless you take a really expansive definition).

They were basically doing a ton of hand-calibration of a very large # of models, to achieve the desired end-goal performance--if you read the supplementary materials, you'll see that they did a lot of very fiddly work to select model output thresholds, build training data, etc.

On the one hand, I don't want to sound overly critical of a pretty cool end-product.

On the other, it really looks a lot more like a "product", in the same way that any gaming AI would be, than a singular (or close to it) AI system which is learning to play the game.

8

graphicteadatasci t1_ixh2mk4 wrote

But they specifically created a model for playing Diplomacy - not a process for building board game playing models. With the right architecture and processes then they could probably do away with most of that hand-calibration stuff but the goal here was to create a model that does one thing.

1

farmingvillein t1_ixictvr wrote

Hmm. Did you read the full paper?

They didn't create a model that does one thing.

They built a whole host of models, with high levels of hand calibration, each configured for a separate task.

1

kaj_sotala t1_ixj2dte wrote

Do you happen to remember which podcast that was? Sounds interesting

1