thiru_2718 t1_jb6njez wrote on March 6, 2023 at 8:55 PM

Reply to comment by currentscurrents in To RL or Not to RL? [D] by vidul7498

>supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Isn't this contradicted by LLMs demonstrating emergent abilities (like learning how meta-learning strategies, or in-context learning) that allow it to tackle complex sequential tasks adaptively? There is research (i.e. https://innermonologue.github.io/) where LLMs are successfully applied to a traditional RL domain - planning and interaction for robots. While there is RLHF involved in models like ChatGPT, the bulk of the model's reasoning comes from the supervised learning.

As far as I can tell, the unexpected, emergent abilities of LLM have somewhat rewritten our assumptions of what is capable through supervised learning, and should be extended into the RL domain.

thiru_2718 t1_j83kkbh wrote on February 11, 2023 at 11:10 AM

Reply to [D] Transformers for poker bot by lmtog

Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?

But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?

thiru_2718 t1_j7o82dn wrote on February 8, 2023 at 5:22 AM

Reply to [N] New Book on Synthetic Data: Version 3.0 Just Released by MLRecipes

Nice work! There's some intriguing sections here that I definitly want to take a look at.

Quick question, with regards to this quote in the preface: "For instance, regression techniques ... are presented as a single method, without using advanced linear algebra."

Are you referring to Generalized Linear Models? I don't see any references to GLMs, in my brief skim, but I can't think of how else regression can be presented as a single method.

Also, is there any place where we can get a preview of "Shape Classification and Synthetization via Explainable AI" section?

thiru_2718 t1_j4piklu wrote on January 17, 2023 at 10:25 AM

Reply to [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew

Inresting question. My intuition if you could maintain a continuously-updated cache of the metric you're using to split your branches (i.e. continuously compute mutual information for each fork), and we assume your new data roughly follows the same distribution as your old data, you maybe able to get away with only modifying the downstream branches of your trees which should be more efficient.

But if that assumption isn't true, then the new data changes your trees closer to the root, and there's little benefit.

thiru_2718 t1_j1eme3a wrote on December 23, 2022 at 7:10 PM

Reply to comment by AllowFreeSpeech in [P] A self-driving car using Nvidia Jetson Nano, with movement controlled by a pre-trained convolution neural network (CNN) written in Taichi by TaichiOfficial

And with that one comment, you've confirmed that you're exactly as ignorant as I suspected you'd be. Thanks "AllowFreeSpeech", hope you enjoy your life policing the subreddit against the use of other languages in code comments...lol.

thiru_2718 t1_j1c64vy wrote on December 23, 2022 at 5:21 AM

Reply to comment by AllowFreeSpeech in [P] A self-driving car using Nvidia Jetson Nano, with movement controlled by a pre-trained convolution neural network (CNN) written in Taichi by TaichiOfficial

Isn't it obvious? The code seems to be by a chinese hackathon team, why would you expect them to comment everything in English?

thiru_2718 t1_iwlth7x wrote on November 16, 2022 at 4:23 PM

Reply to [P] Thoughts on representing a Real world data by snairgit

You can add "velocity" and "time" as separate features, but if there's a known physical relationship between the two, you can encode that relationship more efficiently by combining them into a single feature. So for example, you can capture the kinematics of the data by adding a "position" feature where "position_i+1 = velocity_i+1 * (time_i+1 - time_i) + position_i".