Viewing a single comment thread. View all comments

ReasonablyBadass t1_jae7zhu wrote

Can't read the paper right now, can someone summarize: is it a new model or "just" the standard transformers but used on multi modal data? if it is new, what are the strucutral changes?

6