TehDing

TehDing t1_je08lpg wrote

You can ask GPT to spell a word, or provide the words as individual "S P A C E D" characters and it will similarly do poorly- it has nothing to do with tokenization. GPT is capable of spelling, it can even identify that it is not playing well if you ask if something is a good guess- but continues to give poor answers.

In terms of 'solving' a game as this 20 questions example, there are only 12000 valid words to guess from, or at worst 26^5 possible answers, which still makes this a smaller example (or at worst case on par) as the blog experiment.

Want an easier game? Sucks at Hangman too. It'll guess in terms of frequency, but not well enough to bring together a word. Even guessing on the basis of common ngrams would probably be a good enough strategy.

My experience is that LLMs are poor in terms of novel reasoning. This makes sense, RFHL isn't giving these things a consciousness. Maybe with tweaks/ tools we'll actually see some "thinking", but for now (this may change next week at the rate things are going) it's not very good at games in general as a result (another example: I haven't tried it with GPT4, but GPT3 cheats at chess).

1

TehDing t1_jdtwa6k wrote

I have not been impressed with LLMs reasoning for solving novel puzzles/ challenges. Ask any model to play Wordle with you. They are not good

1