modeless t1_jc4i39e wrote
> performs as well as text-davinci-003
No it doesn't! The researchers don't claim that either, they claim "often behaves similarly to text-davinci-003" which is much more believable. I've seen a lot of people claiming things like this with little evidence. We need some people evaluating these claims objectively. Can someone start a third party model review site?
sanxiyn t1_jc598b3 wrote
Eh, authors do claim they performed blind comparison and "Alpaca wins 90 versus 89 comparisons against text-davinci-003". They also released evaluation set used.
Jeffy29 t1_jc79t9p wrote
Yep, I tried it using some of the prompts I had in my ChatGPT history and it was way worse. At best it performed slightly worse at simple prompts but failed completely at more complex prompts ones and code analyses. Still good for 7B model nothing like ChatGPT.
ivalm t1_jc7e22p wrote
Yup, catastrophically failed all my medical reasoning prompts (that davinci-2/3/ChatGPT get right)
RemarkableGuidance44 t1_jcdsprg wrote
Fine Tune it yourself for Medical.... I have it fine turned for software and it does a great job.
Viewing a single comment thread. View all comments