disastorm
disastorm t1_jd92s7i wrote
Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator
Thanks I found some articles talking about these variables.
disastorm t1_jd66swu wrote
Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator
I see thanks, is that basically the equivallent of having "top_k" = 1?
Can you explain what these mean. From what I understand top_k means it considers the top K number of possible words at each step.
I can't exactly understand what top_p means, can they be use together?
disastorm t1_jcwyjyv wrote
Reply to [D] Simple Questions Thread by AutoModerator
I noticed that "text-generation" models have variable output but alot of other models like chatbots and other models often give the exact same response for the same input prompt. Is there a reason for this, or perhaps is there a setting that would allow a chatbot for example to have variable responses, or is my understanding of this just wrong?
disastorm t1_je8lm7w wrote
Reply to [D] Simple Questions Thread by AutoModerator
I have a question about reinforcement learning, or more specifically gym-retro ( i know gym is pretty old now I guess ).
In the case of gym-retro, if you give a reward to the AI, are they actually looking at a set of variables and saying like "oh I pressed this button while all of these variables were these values and got this reward, so I should press it when all these variables are similar" or are they just saying like "oh I pressed this button and got this reward, so I should press it more often"?