Viewing a single comment thread. View all comments

Vegetable-Skill-9700 OP t1_jdpr15o wrote

I agree that 175B model will always perform better than 6B model on general tasks, so, maybe that is a great model for demos. But as you build product on top on this model which is used in a certain way and satisfies a certain usecase, won't it make sense to use a smaller model and fine-tune on the relevant dataset?

1