mxby7e t1_jdjzkzy wrote on March 24, 2023 at 11:04 PM

The use of OpenAI’s models for generating competing models violates the term of use, which is why the Stanford dataset is restricted.

Maximum t1_jdkepie wrote on March 25, 2023 at 12:57 AM

Also, it's very shady for a company called OpenAI. They claimed they became for profit because they needed the money to grow, but these restrictions just show that they are filthy liars and only care about keeping the power and making profit. I'm sure they already have a strategy going around that 30B cap, just like they planned stealing money and talent by calling themselves non-profit first.

throwaway2676 t1_jdl0y80 wrote on March 25, 2023 at 4:05 AM

Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI

mxby7e t1_jdl18t6 wrote on March 25, 2023 at 4:08 AM

Maybe, open assistant by Stability.ai is doing this type of manual dataset collection. The training data and the model weights are supposed to be released once training is complete

WarAndGeese t1_jdl5t0z wrote on March 25, 2023 at 4:55 AM

Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?

visarga t1_jdlpae7 wrote on March 25, 2023 at 9:24 AM

OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.

lexcess t1_jdlj8tf wrote on March 25, 2023 at 7:53 AM

Classy, especially when they are breezing past any copyright of the datasets they are training off of. I wonder if they can legally enforce that without creating a potentially bad precedent for themselves. Or if it could be worked around if the training was indirect through something like Alpaca.

ebolathrowawayy t1_jdnc05i wrote on March 25, 2023 at 6:05 PM

But what if you're training a model for a narrow use-case and don't intend for anyone to use it except for a niche set of users? Is that enough to be in the clear? Or is any use of OpenAI's model output to train a model for any purpose a no-no?

mxby7e t1_jdncs51 wrote on March 25, 2023 at 6:11 PM

From my understanding its limited to no commercial use, so you can use it for what you need, but not commercially.

big_ol_tender t1_jdjtwdk wrote on March 24, 2023 at 10:22 PM

Pls do! I believe in u

[R] Hello Dolly: Democratizing the magic of ChatGPT with open models

danielbln t1_jdjt8zh wrote on March 24, 2023 at 10:17 PM