Viewing a single comment thread. View all comments

jamesj t1_j86vz1o wrote

It wasn't at all clear that it must emerge with transformer based llms to people working in the field a year ago.

9

ekdaemon t1_j8kqoz5 wrote

Gotcha.

IANE, but I assumed that the combination of the four things mentioned above, including matrix multiplication - would be turing complete - and I thought that anything that is turing complete could absolutely be expected to scale to produce anything desired.

I almost half expected to find that matrix multiplication alone was already known to be turing complete. I see at least one reference to that possibility in a discussion on ycombinator.

1

jamesj t1_j8kwink wrote

It has long been known that neural nets are universal function approximators, even a single layer can approximate any function with enough data/parameters. But in practice there is a huge gap between knowing that eventually it will approximate some function and actually getting a particular system to converge on the useful function given a set of data in a reasonable amount of time (or for a reasonable enough cost).

1