Colecoman1982 t1_jdjkgjy wrote on March 24, 2023 at 9:15 PM

When you asked, did you clarify that you were asking about the training data versus the whole project? The final Alpaca project was built, in part, on top of Meta's LLaMa. Since LLaMa has a strictly non-commercial license, there is no way that Stanford can ever release their final project for commercial use (as they've already stated in their initial release of the project). On the other hand, any training data they've created on their own (without needing any code from LLaMa) should be within their power to re-license. If they think you are asking for the whole project to be re-licenced, they are likely to just ignore your request.

MjrK t1_jdjqz9h wrote on March 24, 2023 at 10:01 PM

> We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based on OpenAI’s text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.

https://crfm.stanford.edu/2023/03/13/alpaca.html

Esquyvren t1_jdjsw1j wrote on March 24, 2023 at 10:15 PM

They said it wasn’t ready but deployed it anyways… lol

MjrK t1_jdk4ig1 wrote on March 24, 2023 at 11:41 PM

For demonstration and research, not widely nor generally.

Disastrous_Elk_6375 t1_jdlix6j wrote on March 25, 2023 at 7:48 AM

The demo was up for a couple of days. The first hours of it being online were rough (80-200 people in queue). It got better the following day, and better still the 3'rd day. I believe they removed the demo ~1week later. IMO they've proven a point - the demo was extremely impressive for a 7b model.

Colecoman1982 t1_jdjuwpp wrote on March 24, 2023 at 10:29 PM

Ah, fair enough.

big_ol_tender t1_jdjl1wx wrote on March 24, 2023 at 9:19 PM

I opened an issue on GitHub specifically about the data license and linked to the data bricks release :)

Colecoman1982 t1_jdjlw80 wrote on March 24, 2023 at 9:25 PM

Very cool, hopefully you'll get through to them.

danielbln t1_jdjt8zh wrote on March 24, 2023 at 10:17 PM

Why has no one regenerated the training set? With gpt3.5 that's like 50 bucks. I can be the change I want to see in the world, but am I missing something?

mxby7e t1_jdjzkzy wrote on March 24, 2023 at 11:04 PM

The use of OpenAI’s models for generating competing models violates the term of use, which is why the Stanford dataset is restricted.

Maximum t1_jdkepie wrote on March 25, 2023 at 12:57 AM

Also, it's very shady for a company called OpenAI. They claimed they became for profit because they needed the money to grow, but these restrictions just show that they are filthy liars and only care about keeping the power and making profit. I'm sure they already have a strategy going around that 30B cap, just like they planned stealing money and talent by calling themselves non-profit first.

throwaway2676 t1_jdl0y80 wrote on March 25, 2023 at 4:05 AM

Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI

mxby7e t1_jdl18t6 wrote on March 25, 2023 at 4:08 AM

Maybe, open assistant by Stability.ai is doing this type of manual dataset collection. The training data and the model weights are supposed to be released once training is complete

WarAndGeese t1_jdl5t0z wrote on March 25, 2023 at 4:55 AM

Boo hoo to openai, people should do it anyway. Is the terms of service the only reason not to do it or are there actual material barriers? If it's a problem of money then as long as people know how much money it can be crowdfunded. If it's a matter of people power then there are already large volunteer networks. Or is it just something that isn't practical or feasible?

visarga t1_jdlpae7 wrote on March 25, 2023 at 9:24 AM

OpenAI has first hand RLHF data. Alpaca has second hand. Wondering if third hand is good enough and free of any restrictions.

lexcess t1_jdlj8tf wrote on March 25, 2023 at 7:53 AM

Classy, especially when they are breezing past any copyright of the datasets they are training off of. I wonder if they can legally enforce that without creating a potentially bad precedent for themselves. Or if it could be worked around if the training was indirect through something like Alpaca.

ebolathrowawayy t1_jdnc05i wrote on March 25, 2023 at 6:05 PM

But what if you're training a model for a narrow use-case and don't intend for anyone to use it except for a niche set of users? Is that enough to be in the clear? Or is any use of OpenAI's model output to train a model for any purpose a no-no?

mxby7e t1_jdncs51 wrote on March 25, 2023 at 6:11 PM

From my understanding its limited to no commercial use, so you can use it for what you need, but not commercially.

big_ol_tender t1_jdjtwdk wrote on March 24, 2023 at 10:22 PM

Pls do! I believe in u

mxby7e t1_jdktvqr wrote on March 25, 2023 at 3:01 AM

The license won’t change. The dataset was collected in a way that violates the term of service of OpenAI, which they used to generate the data. If they allowed commercial use it would open them up to lawsuit.

visarga t1_jdlpf0h wrote on March 25, 2023 at 9:26 AM

What about data generated from Alpaca, is that unrestricted?

impossiblefork t1_jdlddlt wrote on March 25, 2023 at 6:29 AM

Model weights though, are, I assume, not copyrightable.

Is there actually a law giving Stanford any special rights to the weights?

[R] Hello Dolly: Democratizing the magic of ChatGPT with open models

big_ol_tender t1_jdjcfc8 wrote on March 24, 2023 at 8:22 PM