disastorm t1_je8lm7w wrote on March 30, 2023 at 5:07 AM

Reply to [D] Simple Questions Thread by AutoModerator

I have a question about reinforcement learning, or more specifically gym-retro ( i know gym is pretty old now I guess ).

In the case of gym-retro, if you give a reward to the AI, are they actually looking at a set of variables and saying like "oh I pressed this button while all of these variables were these values and got this reward, so I should press it when all these variables are similar" or are they just saying like "oh I pressed this button and got this reward, so I should press it more often"?

disastorm t1_jd92s7i wrote on March 22, 2023 at 6:33 PM

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

Thanks I found some articles talking about these variables.

disastorm t1_jd66swu wrote on March 22, 2023 at 2:48 AM

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

I see thanks, is that basically the equivallent of having "top_k" = 1?

Can you explain what these mean. From what I understand top_k means it considers the top K number of possible words at each step.

I can't exactly understand what top_p means, can they be use together?

disastorm t1_jcwyjyv wrote on March 20, 2023 at 4:54 AM

Reply to [D] Simple Questions Thread by AutoModerator

I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?