Viewing a single comment thread. View all comments

nielsrolf t1_j7stpek wrote

Parti (https://parti.research.google/) showed that being able to spell is an emergent ability. That is the only one I know of, but others that I could imagine are learning compositionally (a blue box between a yellow sphere and a green box), but it's more likely that this is a data issue. Also working out of distribution (a green dog) is a potential candidate. Interesting question

48

These-Assignment-936 OP t1_j7t1t2v wrote

Wow that’s very cool!

7

nielsrolf t1_j7twdn2 wrote

I thought about it again, and another candidate is all LLM capabilities: if you prompt it for "a screenshot of a python method that does xyz" the best solution would be an image that contains working code.

9

visarga t1_j7yftij wrote

There are language models without tokens. They use the raw pixels of an image with text. I can't find the link, Google is not helping me much.

2