AuspiciousApple

AuspiciousApple t1_jdznf40 wrote

Sorry, that was not very clearly explained on my part.

Do you understand that these models have weights/parameters - numbers that define their behaviour? The standard sense of "learning" in ML is to update these weights to fit some training data better.

And are you aware that large language model get a sequence of text (the "context") and predict the next bit of text from that? Now, these models can use examples in the text they are given to do things they otherwise wouldn't be able to. This is called in-context learning. However, here the parameters of the model don't change and if the examples aren't in the context, then the model doesn't remember anything about it.

1

AuspiciousApple t1_jdyjclk wrote

It could do something semi-fancy, or it might simply be prepending the translation prompt with the previous input, translation, and user-based edits so that it can adjust to your specific perferences. It's called in-context learning, the model doesn't change so it doesn't learn in the standard sense, but it still learns from the current context.

3

AuspiciousApple t1_jb557q8 wrote

Not obviously so.

First, de-duplicating text data didn't help much in the cramming paper. Second, even if the images are duplicates, the captions might be different so you still learn more than if you only had one copy of each image.

Finally, even with exact copies of text and image, it would just weigh those images more heavily than the rest - which could harm performance, not matter at all, or even help performance (for instance if those images tend to be higher quality/more interesting/etc.)

98

AuspiciousApple t1_iuzb7wc wrote

It's still a fair question. Suppose I generate something with a common seed (1234, 42, 69420, whatever) and default settings of a popular stable diffusion UI. Other people might conceivably end up generating a very similar image or even the same if they use the exact same prompt.

In that case, does the first people to generate it have the copyright? Do they lose it once it's been generated a second time?

12

AuspiciousApple t1_iuzastm wrote

>Do they implicitly mean DALLE 2 or do they actually mean 1?

OpenAI is terrible for this. You should assume that it's some random black box that they can change at any time.

There recently was some minor scandal in the NLP community - the GPT3 variants in the API were trained differently from what the papers described, so it invalidated tons of papers.

87

AuspiciousApple OP t1_itwy8bj wrote

>Maybe for the sake of experimentation, take one of the tasks where CoT performs considerably better than the direct prompt?

That sounds a good idea, though NLP isn't really my field, so I might also not be using the correct sampling parameters/make subtle mistakes in writing the question (e.g. punctuation, line breaks, etc.), so I was hoping someone here would know more.

Even for English to German translation, the model often generated obvious nonsense, sometimes even just repeating the english phrase, despite using the prompt as it is in the hugging face config/paper.

1

AuspiciousApple OP t1_itww8zh wrote

I only skimmed the paper, but I think they said that (at least for some benchmarks) they count exact matches as correct, so yes maybe generating anything but the answer doens't count?

I tried the example from this dataset that they use in one of their figures, and I seemed to get the correct answer with the XL variant most of the time, but the rationale was nonsense ~80% of the time even when it was correct. E.g. "Plastic containers are a thing of the past" or "Plastic containers are way too precious to store food" or "Wood is not sturdy enough to store food".

1

AuspiciousApple OP t1_itwvq1f wrote

>I dont think any model you can run on a single commodity gpu will be on par with gpt-3.

That makes sense. I'm not an NLP person, so I don't have a good intuition on how these models scale or what the benchmark numbers actually mean.

In CV, the difference between a small and large model might be a few % accuracy on imagenet but even small models work reasonably well. FLAN T5-XL seems to generate nonsense 90% of the time for the prompts that I've tried, whereas GPT3 has great output most of the time.

Do you have any experience with these open models?

1

AuspiciousApple OP t1_itwoeep wrote

A bit jealous of all that VRAM and all those cores.

The usage example here: https://huggingface.co/google/flan-t5-xl is quite easy to follow. Getting it up and running should take you all of 5 minutes plus the time to download the model. You could probably also run the XXL model.

2