ResponsibleHouse7436

ResponsibleHouse7436 t1_iu00jwz wrote

Hows it going, I am currently trying to train some speech recognition models and doing some research on novel encoder architectures for e2e ASR. However I don't have a ton of compute resources. My final model will be around 300M parameters but I was wondering if training a couple of architectures at say 25-50M params and then scaling the best one is a valid approach to this problem. Why or why not?

1