Emergency_Apricot_77
Emergency_Apricot_77 t1_jah9rb7 wrote
Reply to comment by Kaleidophon in [D] backprop through beam sampling ? by SaltyStackSmasher
Why go with BLEU though ? OP didn't particularly mention optimizing sequence level metrics. Can't we still use cross entropy ? Something as follows:
Sample first token, calculate cross-entropy with first token of gold
Sample second token, calculate cross-entropy with second token of gold
Sample third token, calculate cross-entropy with third token of gold
... and so on ?
​
This way we still have differentiable metric but we have a much better alignment between train and inference scenarios -- as opposed to current teacher forcing training and sampling inference -- which I thought the OP was going for.
Emergency_Apricot_77 t1_j9b68si wrote
Reply to comment by Rockingtits in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
They literally asked for LARGE language models
Emergency_Apricot_77 OP t1_j0fe4lo wrote
Reply to comment by prototypist in [D] Is "natural" text always maximally likely according to language models ? by Emergency_Apricot_77
Thanks for this ! Typical decoding paper contains really useful information that is similar to what I was looking for
Emergency_Apricot_77 OP t1_j0c3cii wrote
Reply to comment by dojoteef in [D] Is "natural" text always maximally likely according to language models ? by Emergency_Apricot_77
This is VERY similar to what I was looking for. Thanks a LOT for this
Submitted by Emergency_Apricot_77 t3_zmd6l8 in MachineLearning
Emergency_Apricot_77 t1_iqurybv wrote
Reply to comment by Lone-Pine in [D] Types of Machine Learning Papers by Lost-Parfait568
Who?
Emergency_Apricot_77 t1_jdgbocg wrote
Reply to comment by whyelrond in [N] ChatGPT plugins by Singularian2501
Care to explain more on symbolic approaches via Wolfram?