ThisIsMyStonerAcount t1_jdeqmjc wrote on March 23, 2023 at 9:29 PM

What's your end goal?

SomeGuyInDeutschland OP t1_jdewszx wrote on March 23, 2023 at 10:09 PM

Initially just a chat bot to test how strong it it

But ultimately use it to help me code up websites

Username912773 t1_jdgp0u0 wrote on March 24, 2023 at 7:26 AM

Just use a static website generator

RoaRene317 t1_jdfnzna wrote on March 24, 2023 at 1:21 AM

My suggestion is using 8 bit or 4 bit quantization. Also you can using automatic device mapping on Transformers that can offload partially to your CPU (warning : It use lots of System Memory [RAM]).

Civil_Collection7267 t1_jdfogwq wrote on March 24, 2023 at 1:25 AM

You can use 4-bit LLaMA 13B or 8-bit LLaMA 7B with the alpaca lora, both are very good. If you need help, this guide explains everything

ggf31416 t1_jdesxc0 wrote on March 23, 2023 at 9:44 PM

With memory offloading and 8-bit quantization you may be able to run the 13B model, but slowly. The 7B will be faster.

suflaj t1_jdf3j2k wrote on March 23, 2023 at 10:56 PM

Unless you plan on quantizing your model or loading it layer by layer, I'm afraid 2B parameters is the most you'll get. 10GB VRAM is not really enough for CV nowadays, let alone NLP. With quantization, you can barely run the 7B model.

4 bit doesn't matter at the end of the day since it's not supported out of the box, unless you intend to implement it yourself.

[D] Which AI model for RTX 3080 10GB?

Comments