Submitted by abc220022 t3_100y331 in MachineLearning
-Rizhiy- t1_j2lkgn2 wrote
Look at papers dealing with multi-modal tasks. e.g. Perceiver/Perceiver IO by DeepMind
You can encode your data into tokens with the same size using something like an MLP. Then feed these tokens into decoder along with encoder tokens. Should probably also add an learnable embedding for different types of data to prevent signal confusion.
Viewing a single comment thread. View all comments