Submitted by jaxolingo t3_125qztx in MachineLearning
Goldenier t1_je9uruu wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
This is false, and actually most of the time the opposite is the problem: the model learns too much of the new data it's finetuned on (overfitting on it), but forgets the "knowledge" in the original model. The simplest and most popularly used example right now is when you use the dreambooth, Lora or other finetuning methods to finetune parts of the big image diffusion models and if you overtrain it will place the newly trained face or object in almost all of it's output, so it easily learns new data but also easily forgets old one. ( One mitigation for this is to use preservation loss to make sure it also keeps the old knowledge. ) And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.
LetGoAndBeReal t1_je9zfyb wrote
>And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.
It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".
Viewing a single comment thread. View all comments