Submitted by jaxolingo t3_125qztx in MachineLearning
machineko t1_je70llx wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Why would you say that fine-tuning is not viable? There are many production use cases of fine-tuning a model using in-house proprietary data.
If fact, if you have the resources you can do both fine-tuning of an existing model (whether is just supervised or unsupervised) and also use that for retrieval augmented generation.
LetGoAndBeReal t1_je71r0g wrote
Fine-tuning can be great for getting better output from the model based on the knowledge that model already contains. I only meant fine-tuning is not viable for getting new data/knowledge into a model. Fine-tuning does not accomplish knowledge absorption.
WokeAssBaller t1_je7y09s wrote
Huh? I think that depends on the fine tuning you are talking about. Fine tuning can absolutely add knowledge to a model
lgastako t1_je8i6dw wrote
Not generally very well.
WokeAssBaller t1_jea0ubd wrote
Fine tuning is additional training, there are lots of ways of doing that and sometimes it’s absolutely ideal, there are tradeoffs
lgastako t1_jea7kb3 wrote
Would love to see an example of it adding knowledge effectively. I haven't been able to find any at all.
WokeAssBaller t1_jealxm2 wrote
Train one from scratch
lgastako t1_jeayn8v wrote
I know training a model from scratch will work, but the context of the conversation is fine tuning an existing model and I'm saying I would love to see examples of the claims people are making actually working, because I have only been able to find and create examples of it not working very well at all.
WokeAssBaller t1_jebpjog wrote
fine tuning is just additional training, so if it works from scratch it works with fine tuning. And no it may not be as effective as other methods but the poster was claiming it was impossible
lgastako t1_jecb96v wrote
Ok, so can you point me to an example of it working well?
WokeAssBaller t1_jecc92g wrote
What a waste of time
[deleted] t1_jece89n wrote
[removed]
machineko t1_je83m8x wrote
Unsupervised fine-tuning (or extending the pre-training) with additional data will work. Of course, how to get it to learn new information effectively is a challenge but not impossible.
Goldenier t1_je9uruu wrote
This is false, and actually most of the time the opposite is the problem: the model learns too much of the new data it's finetuned on (overfitting on it), but forgets the "knowledge" in the original model. The simplest and most popularly used example right now is when you use the dreambooth, Lora or other finetuning methods to finetune parts of the big image diffusion models and if you overtrain it will place the newly trained face or object in almost all of it's output, so it easily learns new data but also easily forgets old one. ( One mitigation for this is to use preservation loss to make sure it also keeps the old knowledge. ) And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.
LetGoAndBeReal t1_je9zfyb wrote
>And there is no reason why the same methods wouldn't work on LLMs too, for example there is already Lora for LLMs too.
It's really not helpful to make strong assertions like this without referring to specific, verifiable sources. Fine-tuning very typically is done in a way where certain layers/parameters of the model are frozen. This is done to avoid the sort of loss we are discussing. The LoRA paper itself states that LoRA "freezes the pre-trained model weights".
Viewing a single comment thread. View all comments