fnbr t1_jd6jsb6 wrote on March 22, 2023 at 4:51 AM

Reply to [D] What's the Time and Space Complexity of Transformer Models Inference? by Smooth-Earth-9897

The best analysis on this is from this blog:

https://kipp.ly/blog/transformer-inference-arithmetic/

fnbr t1_jd6j7gh wrote on March 22, 2023 at 4:45 AM

Reply to [D] Running an LLM on "low" compute power machines? by Qwillbehr

Right now, the tech isn't there to train on a single GPU. You're gonna end up training a language model for ~1 month to do so. It is slightly more efficient, though.

Lots of people looking at running locally. In addition to everything that people have said, there's a bunch of companies that will be releasing models that can just barely fit on an A100 soon that I've heard rumours about from employees.

fnbr t1_j7q9v1m wrote on February 8, 2023 at 5:25 PM

Reply to [D] List of RL Papers by C_l3b

Sutton and Barto is obligatory if you want to learn RL, imo. Even as an experienced researcher, I read it every year or so. It's very approachable.

fnbr t1_j6j9f11 wrote on January 30, 2023 at 6:55 PM

Reply to [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Have you looked at some of the architectures that get rid of BatchNorm (e.g. NFNets)? In my experience, BatchNorm tends to be quite slow, so I wonder if there's some speed to be gained there.

fnbr t1_j3uzyom wrote on January 11, 2023 at 6:13 AM

Reply to [D] Options for data scientist who wants more social interaction by pseudoanonyme

Become a manager, you’ll have tons of meetings with people.

Alternatively, get a social hobby. You don’t need social connections to come from work.

fnbr t1_j1bc8i3 wrote on December 23, 2022 at 1:13 AM

Reply to [D] Non-deep Q learning with OpenAI gym lunar lander - anyone? by verbigratia

The main problem with tabular Q-learning (I'm assuming that by classical, you mean tabular) is that for most environments that are interesting, the state space is massive, so we can't actually store all states in memory.

In particular for lunar lander, you have a continuous observation space, so you need to apply some sort of discretization; at that point, you might as well just use tile coding or some sort of other function approximator.

fnbr t1_iv9iag6 wrote on November 6, 2022 at 9:36 AM

Reply to [D] Enforcing object order in object detection by give_me_the_truth

Why force the network to do this instead of just sorting the predictions afterward?

fnbr t1_iuj8h11 wrote on October 31, 2022 at 7:10 PM

Reply to [D] When the GPU is NOT the bottleneck...? by alexnasla

Have you profiled your code? That would be the first thing I would do.

What sort of utilization of the GPU are you getting?

It's likely you're bottlenecked by feeding data in- for supervised learning, that's often the case.

I'm happy to offer suggestions for feeding data in if you're using Tensorflow/JAX.

fnbr t1_iuhn5pu wrote on October 31, 2022 at 12:24 PM

Reply to comment by SignificantPen1325 in Tax Return Question About Investment Losses by SignificantPen1325

If you do this, make sure you don’t buy the same security within 30 days or it’s a “wash-loss” and you can’t dedicate the loss from your income.

fnbr t1_iqortww wrote on October 1, 2022 at 11:58 PM

Reply to [D] New Laptop for Deep Learning/ Machine Learning by MyActualUserName99

What sort of loads are you planning on running on your laptop? If you’re running LLMs the answer’s gonna be totally different than if you’re trying to train RL agents in grid worlds.