AuspiciousApple t1_je2aij3 wrote on March 28, 2023 at 9:59 PM

Reply to comment by Thorusss in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Idk, thousands of GPUs going brrrr for months, how much can it cost?

$10?

AuspiciousApple t1_jdznf40 wrote on March 28, 2023 at 11:08 AM

Reply to comment by petrastales in [D] Can DeepL learn from edits to the translations it produces immediately? by petrastales

Sorry, that was not very clearly explained on my part.

Do you understand that these models have weights/parameters - numbers that define their behaviour? The standard sense of "learning" in ML is to update these weights to fit some training data better.

And are you aware that large language model get a sequence of text (the "context") and predict the next bit of text from that? Now, these models can use examples in the text they are given to do things they otherwise wouldn't be able to. This is called in-context learning. However, here the parameters of the model don't change and if the examples aren't in the context, then the model doesn't remember anything about it.

AuspiciousApple t1_jdyjclk wrote on March 28, 2023 at 2:59 AM

Reply to [D] Can DeepL learn from edits to the translations it produces immediately? by petrastales

It could do something semi-fancy, or it might simply be prepending the translation prompt with the previous input, translation, and user-based edits so that it can adjust to your specific perferences. It's called in-context learning, the model doesn't change so it doesn't learn in the standard sense, but it still learns from the current context.

AuspiciousApple t1_jb6gzcd wrote on March 6, 2023 at 8:13 PM

Reply to comment by currentscurrents in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust

Can't wait to see this replicated!

AuspiciousApple t1_jb557q8 wrote on March 6, 2023 at 2:48 PM

Reply to comment by JrdnRgrs in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust

Not obviously so.

First, de-duplicating text data didn't help much in the cramming paper. Second, even if the images are duplicates, the captions might be different so you still learn more than if you only had one copy of each image.

Finally, even with exact copies of text and image, it would just weigh those images more heavily than the rest - which could harm performance, not matter at all, or even help performance (for instance if those images tend to be higher quality/more interesting/etc.)

AuspiciousApple t1_iuzb7wc wrote on November 4, 2022 at 2:52 AM

Reply to comment by zacker150 in [D] DALL·E to be made available as API, OpenAI to give users full ownership rights to generated images by TiredOldCrow

It's still a fair question. Suppose I generate something with a common seed (1234, 42, 69420, whatever) and default settings of a popular stable diffusion UI. Other people might conceivably end up generating a very similar image or even the same if they use the exact same prompt.

In that case, does the first people to generate it have the copyright? Do they lose it once it's been generated a second time?

AuspiciousApple t1_iuzastm wrote on November 4, 2022 at 2:49 AM

Reply to comment by master3243 in [D] DALL·E to be made available as API, OpenAI to give users full ownership rights to generated images by TiredOldCrow

>Do they implicitly mean DALLE 2 or do they actually mean 1?

OpenAI is terrible for this. You should assume that it's some random black box that they can change at any time.

There recently was some minor scandal in the NLP community - the GPT3 variants in the API were trained differently from what the papers described, so it invalidated tons of papers.

AuspiciousApple OP t1_itx00gv wrote on October 26, 2022 at 10:32 PM

Reply to comment by Southern-Trip-1102 in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

Thanks! Even a qualitative subjective judgement of rough parity is quite encouraging. I might need deepspeed/etc. to get it to run on my 8GB GPU, but if it's even similar quality, that's very cool.

AuspiciousApple OP t1_itwy8bj wrote on October 26, 2022 at 10:19 PM

Reply to comment by utilop in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

>Maybe for the sake of experimentation, take one of the tasks where CoT performs considerably better than the direct prompt?

That sounds a good idea, though NLP isn't really my field, so I might also not be using the correct sampling parameters/make subtle mistakes in writing the question (e.g. punctuation, line breaks, etc.), so I was hoping someone here would know more.

Even for English to German translation, the model often generated obvious nonsense, sometimes even just repeating the english phrase, despite using the prompt as it is in the hugging face config/paper.

AuspiciousApple OP t1_itww8zh wrote on October 26, 2022 at 10:04 PM

Reply to comment by utilop in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

I only skimmed the paper, but I think they said that (at least for some benchmarks) they count exact matches as correct, so yes maybe generating anything but the answer doens't count?

I tried the example from this dataset that they use in one of their figures, and I seemed to get the correct answer with the XL variant most of the time, but the rationale was nonsense ~80% of the time even when it was correct. E.g. "Plastic containers are a thing of the past" or "Plastic containers are way too precious to store food" or "Wood is not sturdy enough to store food".

AuspiciousApple OP t1_itwvq1f wrote on October 26, 2022 at 10:01 PM

Reply to comment by _Arsenie_Boca_ in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

>I dont think any model you can run on a single commodity gpu will be on par with gpt-3.

That makes sense. I'm not an NLP person, so I don't have a good intuition on how these models scale or what the benchmark numbers actually mean.

In CV, the difference between a small and large model might be a few % accuracy on imagenet but even small models work reasonably well. FLAN T5-XL seems to generate nonsense 90% of the time for the prompts that I've tried, whereas GPT3 has great output most of the time.

Do you have any experience with these open models?

AuspiciousApple OP t1_itwvawz wrote on October 26, 2022 at 9:58 PM

Reply to comment by Southern-Trip-1102 in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

Thanks, I'll take a look. Have you played around with them yourself?

AuspiciousApple OP t1_itwohgl wrote on October 26, 2022 at 9:12 PM

Reply to comment by AuspiciousApple in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

Would be curious to hear whether you get reasonable output with the XXL variant or with GPTJ.

AuspiciousApple OP t1_itwoeep wrote on October 26, 2022 at 9:12 PM

Reply to comment by hapliniste in [D] What's the best open source model for GPT3-like text-to-text generation on local hardware? by AuspiciousApple

A bit jealous of all that VRAM and all those cores.

The usage example here: https://huggingface.co/google/flan-t5-xl is quite easy to follow. Getting it up and running should take you all of 5 minutes plus the time to download the model. You could probably also run the XXL model.