Submitted by abc220022 t3_100y331 in MachineLearning
The classic way of incorporating sequential context into a transformer model is to make an encoder-decoder transformer, where the context is processed by the encoder component, and then read into the decoder blocks via cross-attention.
However, there are a number of situations where you might want to incorporate non-sequential context into a model - for example, you might want a language model that can generate text conditioned on some input vector describing the person whose text you are trying to emulate, or you might want to condition on some single vector that summarizes all text that occurred prior to the context window. What are standard ways of incorporating such context? I'd also be interested in standard ways of incorporating such context in other sequence models like RNNs with LSTM blocks.
bushcat89 t1_j2kzznq wrote
If I remember correctly Temporal Fusion Transformers tackle the problem of incorporating non sequential data into a transformer.