fnbr

fnbr t1_jd6j7gh wrote

Right now, the tech isn't there to train on a single GPU. You're gonna end up training a language model for ~1 month to do so. It is slightly more efficient, though.

Lots of people looking at running locally. In addition to everything that people have said, there's a bunch of companies that will be releasing models that can just barely fit on an A100 soon that I've heard rumours about from employees.

1

fnbr t1_j1bc8i3 wrote

The main problem with tabular Q-learning (I'm assuming that by classical, you mean tabular) is that for most environments that are interesting, the state space is massive, so we can't actually store all states in memory.

In particular for lunar lander, you have a continuous observation space, so you need to apply some sort of discretization; at that point, you might as well just use tile coding or some sort of other function approximator.

10

fnbr t1_iuj8h11 wrote

Have you profiled your code? That would be the first thing I would do.

What sort of utilization of the GPU are you getting?

It's likely you're bottlenecked by feeding data in- for supervised learning, that's often the case.

I'm happy to offer suggestions for feeding data in if you're using Tensorflow/JAX.

10