Submitted by austintackaberry t3_120usfk in MachineLearning
danielbln t1_jdjt8zh wrote
Reply to comment by big_ol_tender in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Why has no one regenerated the training set? With gpt3.5 that's like 50 bucks. I can be the change I want to see in the world, but am I missing something?
mxby7e t1_jdjzkzy wrote
The use of OpenAI’s models for generating competing models violates the term of use, which is why the Stanford dataset is restricted.
__Maximum__ t1_jdkepie wrote
Also, it's very shady for a company called OpenAI. They claimed they became for profit because they needed the money to grow, but these restrictions just show that they are filthy liars and only care about keeping the power and making profit. I'm sure they already have a strategy going around that 30B cap, just like they planned stealing money and talent by calling themselves non-profit first.
throwaway2676 t1_jdl0y80 wrote
Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI
mxby7e t1_jdl18t6 wrote
Maybe, open assistant by Stability.ai is doing this type of manual dataset collection. The training data and the model weights are supposed to be released once training is complete
WarAndGeese t1_jdl5t0z wrote
Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?
visarga t1_jdlpae7 wrote
OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.
lexcess t1_jdlj8tf wrote
Classy, especially when they are breezing past any copyright of the datasets they are training off of. I wonder if they can legally enforce that without creating a potentially bad precedent for themselves. Or if it could be worked around if the training was indirect through something like Alpaca.
ebolathrowawayy t1_jdnc05i wrote
But what if you're training a model for a narrow use-case and don't intend for anyone to use it except for a niche set of users? Is that enough to be in the clear? Or is any use of OpenAI's model output to train a model for any purpose a no-no?
mxby7e t1_jdncs51 wrote
From my understanding its limited to no commercial use, so you can use it for what you need, but not commercially.
big_ol_tender t1_jdjtwdk wrote
Pls do! I believe in u
Viewing a single comment thread. View all comments