modeless t1_jc4i39e wrote on March 13, 2023 at 11:48 PM

> performs as well as text-davinci-003

No it doesn't! The researchers don't claim that either, they claim "often behaves similarly to text-davinci-003" which is much more believable. I've seen a lot of people claiming things like this with little evidence. We need some people evaluating these claims objectively. Can someone start a third party model review site?

sanxiyn t1_jc598b3 wrote on March 14, 2023 at 3:12 AM

Eh, authors do claim they performed blind comparison and "Alpaca wins 90 versus 89 comparisons against text-davinci-003". They also released evaluation set used.

Jeffy29 t1_jc79t9p wrote on March 14, 2023 at 3:45 PM

Yep, I tried it using some of the prompts I had in my ChatGPT history and it was way worse. At best it performed slightly worse at simple prompts but failed completely at more complex prompts ones and code analyses. Still good for 7B model nothing like ChatGPT.

ivalm t1_jc7e22p wrote on March 14, 2023 at 4:12 PM

Yup, catastrophically failed all my medical reasoning prompts (that davinci-2/3/ChatGPT get right)

RemarkableGuidance44 t1_jcdsprg wrote on March 16, 2023 at 2:35 AM

Fine Tune it yourself for Medical.... I have it fine turned for software and it does a great job.