Viewing a single comment thread. View all comments

currentscurrents OP t1_j44nngb wrote

The paper does talk about this and calls transformers "first generation compositional systems" - but limited ones.

>Transformers, on the other hand, use graphs, which in principle can encode general, abstract structure, including webs of inter-related concepts and facts.

> However, in Transformers, a layer’s graph is defined by its data flow, yet this data flow cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used by that layer, the graph disappears. For the graph to be a bona fide encoding, carrying information to the rest of the network, it would need to be represented with an activation vector that encodes the graph’s abstract, compositionally-structured internal information.

>The technique we introduce next—NECST computing—provides exactly this type of activation vector.

They then talk about a more advanced variant called NECSTransformers, which they consider a 2nd generation compositional system. But I haven't heard of this system before and I'm not clear if it actually performs better.

10