aliasaria t1_jefj33h wrote on March 31, 2023 at 5:14 PM

Reply to comment by lxe in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

It's a very different way to finetune a model efficiently.

All these tools try to nudge an existing large model, without having to nudge all the weights.

A simplistic explanation of LoRA is that LoRA looks at the whole pretrained model and tries to identify only the most influential weights, and nudge those only.

This tool, instead, adds weights to the model (at the start of prompts) in addition to the existing model.

One advantage to LoRA, in this case, is that you can merge your LoRA finetuned weights into the original model and the result is a new model that is exactly the same size and shape as the original model. In the technique in this paper, however, the final model is a different shape from the original model. But the concept is sort of simpler.

aliasaria t1_jefih93 wrote on March 31, 2023 at 5:10 PM

Reply to comment by CasulaScience in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama

A short answer is that it is "just different". It's another way to tweak an existing LLM to do another task, without having to finetune the whole system. Conceptually, this way is simpler than LoRA and seems to work as well or better.

In the paper, the authors mention that one advantage is that you can use this technique to add new modalities. The whole method works by adding to the prompt at the top most layer(s), so you can add not just words, you could add tokens that come from an image. They have an example on the top of page 4 with a picture of a baby opening a door.