omniron t1_j44m4s1 wrote
I think LLM and large transformer networks are basically finding structure and composition in raw data
And there’s some (as yet unknown) way to get rote symbolic manipulation by stacking some similar system in top of them— similar to how LLM guides diffusion models work
currentscurrents OP t1_j44nngb wrote
The paper does talk about this and calls transformers "first generation compositional systems" - but limited ones.
>Transformers, on the other hand, use graphs, which in principle can encode general, abstract structure, including webs of inter-related concepts and facts.
> However, in Transformers, a layer’s graph is defined by its data flow, yet this data flow cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used by that layer, the graph disappears. For the graph to be a bona fide encoding, carrying information to the rest of the network, it would need to be represented with an activation vector that encodes the graph’s abstract, compositionally-structured internal information.
>The technique we introduce next—NECST computing—provides exactly this type of activation vector.
They then talk about a more advanced variant called NECSTransformers, which they consider a 2nd generation compositional system. But I haven't heard of this system before and I'm not clear if it actually performs better.
Viewing a single comment thread. View all comments