Submitted by deck4242 t3_125q87z in MachineLearning

Hello

what are we talking in term of diminishing returns between the 2 models ?

do the 65b really improve a lot ?

bonus question: how to train the 7b model to learn specific field on my computer ? (makin it tailored to my needs)

7

Comments

You must log in or register to comment.

RedditLovingSun t1_je5m80p wrote

Significantly better

I guess it would be interesting to see if the performance difference gets wider or narrower after self-instruct optimizations like alpaca

12

ortegaalfredo t1_je5urre wrote

I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.

Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.

11

machineko t1_je86hwt wrote

What GPUs are you using to run them? Are you using any compression (i.e. quantization)?

1

ortegaalfredo t1_jegn9zu wrote

2x3090, 65B is using int4, 30B is using int8 (required for LoRA)

2