JClub t1_jc5ys39 wrote on March 14, 2023 at 8:07 AM

Reply to [D] ChatGPT without text limits. by spiritus_dei

Is there any implementation of CAM? Why is this better than the tglobal attention used in LongT5?

JClub t1_jbwu3lx wrote on March 12, 2023 at 9:32 AM

Reply to comment by Non-jabroni_redditor in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

more than that, GPT is unidirectional, which is really not great a sentence embedder

JClub t1_jabyi76 wrote on February 28, 2023 at 9:30 AM

Reply to comment by AiChip in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

JClub t1_jabyhe8 wrote on February 28, 2023 at 9:30 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2020, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

JClub t1_jabyh73 wrote on February 28, 2023 at 9:30 AM

Reply to comment by astonzhang in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

JClub t1_j9j5cv4 wrote on February 22, 2023 at 10:03 AM

Reply to [P] minLoRA: An Easy-to-Use PyTorch Library for Applying LoRA to PyTorch Models by cccntu

Sorry, but why do we need another package? Can't you build on top of https://github.com/huggingface/peft ?

JClub OP t1_j57rrn6 wrote on January 21, 2023 at 12:09 AM

Reply to comment by Ouitos in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

ah yes, you're right. I actually don't know why, but you can check the implementation and ask it on GitHub

JClub OP t1_j51h8up wrote on January 19, 2023 at 7:08 PM

Reply to comment by Ouitos in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

> Shouldn't it be the opposite ?

Yes, that makes more sense. Will change!

> How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

Maybe I did not explain properly what the clip is doing. If you have ratio=0.6, then it become 0.8 and if it is > 1.2, it becomes 1.2
Does that make more sense? Regarding the min operation, it's just an heuristic to choose the smaller update tbh

JClub OP t1_j4zejga wrote on January 19, 2023 at 9:12 AM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

Yes, 100% agree with you. I believe that the researchers have also tried pseudo labeling or making the reward differentiable as you say, and maybe RL is the SOTA approach now. But these are just guesses!

JClub OP t1_j4z5ciu wrote on January 19, 2023 at 7:11 AM

Reply to comment by JoeHenzi in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

This package is pretty simple to use! https://github.com/lvwerra/trl

It supports decoder-only models like GPT and it is in the process of supporting enc-dec like T5.

JClub OP t1_j4z57kr wrote on January 19, 2023 at 7:09 AM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

Yes, the reward model can rank model outputs but it does that by giving a score to each output. You want to train with this score, not with "pseudo labeling" as you are stating. But the reward score is non-differentiable, and RL helps to construct a differentiable loss. Does that make sense?

JClub OP t1_j4xgp2x wrote on January 18, 2023 at 11:03 PM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

You're not the first person that asks me that question! I need to add a more detailed explanation for that :)

The reward is non-differentiable because it was produced with a reward model, and this reward model takes text as input. This text was obtained by decoding the log probabilities of the output of your model. This decoding process is non-differentiable and we lose the gradient link between the LM model and the reward model.

Does this make sense? Also, if the reward is given directly by a human, instead of a reward model, it's clearer that this reward is non-differentiable.

RL helps transforming this non-differentiable reward into a differentiable loss :)

JClub OP t1_j4v5d0y wrote on January 18, 2023 at 2:06 PM

Reply to comment by koolaidman123 in [D] RLHF - What type of rewards to use? by JClub

Ah right, then you can just use the model's reward directly or pass it through a sigmoid so that the reward is between 0-1!

Do you think that the sigmoid is needed?

JClub OP t1_j4v057p wrote on January 18, 2023 at 1:25 PM

Reply to comment by koolaidman123 in [D] RLHF - What type of rewards to use? by JClub

yeah, instructGPT is like that. How do you calculate a reward score for each output in this ranking scenario?

JClub OP t1_j4uc8lc wrote on January 18, 2023 at 8:44 AM

Reply to comment by buzzbuzzimafuzz in [D] RLHF - What type of rewards to use? by JClub

Yes, that makes sense! But for example, can you really combine a thumbs-up/down experience with a scale of 1-5? That will be even harder to make them both work together when training the model, right?

JClub OP t1_j4uc0bg wrote on January 18, 2023 at 8:41 AM

Reply to comment by velcher in [D] RLHF - What type of rewards to use? by JClub

PPO's formula makes the gradient update always rather smaller than other RL algorithms. I get that the reward is measuring the human's preference but that does not answer my question 🤔 : what rewards work best for PPO?

JClub t1_j46sm7q wrote on January 13, 2023 at 3:50 PM

Reply to [D] Has ML become synonymous with AI? by Valachio

If you see it on slides, it is AI. If you see it in Python, it is ML.

JClub t1_j1yp8uf wrote on December 28, 2022 at 11:17 AM

Reply to comment by mac4281 in [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin

I understand your point. My point is that these guys are making you pay for it when this API that you want could definitely be for free.

Without open source they would not be able to make you pay for any of this, so using open source tools to make paid APIs just doesn't go along with me.

JClub t1_j1ujk9d wrote on December 27, 2022 at 2:39 PM

Reply to comment by the_magic_gardener in [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin

Yes, there are some colab notebooks that do it for you easily. This one works great: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

JClub t1_j1u36tt wrote on December 27, 2022 at 11:57 AM

Reply to [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin

Another guy making money out of dreambooth training when you can do it for free on Google Colab...

JClub t1_j0fwb4d wrote on December 16, 2022 at 10:00 AM

Reply to [P] Medical question-answering without hallucinating by tmblweeds

Nice project!

How do you prevent the model from hallucinating? I did not get that. Do you just hope that the model will copy from the top 10 searches you give it?

JClub t1_izjnf35 wrote on December 9, 2022 at 4:28 PM

Reply to comment by tetrisdaemon in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon

Damn then this method can only run on such hardware, the attention weights are very heavy!

JClub t1_izij5x5 wrote on December 9, 2022 at 10:42 AM

Reply to [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon

Hey! I'm the author of https://github.com/JoaoLages/diffusers-interpret

I have also tried to collect attentions in the diffusion process but the matrices with (text size, image size) were too big to keep in RAM/VRAM, how did you solve that problem?