Submitted by natural_language_guy t3_ypxyud in MachineLearning
natural_language_guy OP t1_ivsdjo8 wrote
Reply to comment by new_name_who_dis_ in [D] Is there anything like beam search with BERT? by natural_language_guy
If the advice is to discard BERT and go with MDN, do you think MDNs in this case would perform better than some large generative model like t5 with beam search?
The MDN does look interesting, and it looks like there are some libraries available for it already, but I don't have much experience using deep prob. models.
new_name_who_dis_ t1_ivtav2j wrote
No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs
Viewing a single comment thread. View all comments