Recent comments in /f/deeplearning

_icosahedron t1_jeh3rlg wrote

I am really enjoying the course by https://dlsyscourse.org. You build a DL library called Needle from scratch throughout the lectures. They also offer notebooks and tests to pass, so it's more than just videos.

It's also by CMU, but the videos are different than the one referenced by /u/verboseEqualsTrue, though I plan on watching that one too.

They cover transformers and other network types, but I haven't gotten that far in the videos yet. I'm still implementing an automatic differentiator.

I have found that Karpathy videos are really good too, though I'm not following them too closely yet.

2

ab3rratic t1_jegpq3d wrote

It might be. But then tensorflow/pytorch are out there, too, have documentation and have CPU-only modes.

I say this as someone who coded a number of numeric algorithms (BFGS, Nelder-Mead, etc) from scratch just to understand them, while knowing all along I wasn't competing with real OR libs. On a few occasions, my own implementations were handy for proof-of-concept work when adding 3rd party libs was a hassle.

4

LiquidDinosaurs69 t1_jeg8yos wrote

You should do something else. There are a lot of small C++ nn libraries. To make this one competitive with a real deep learning framework you would need to implement everything with gpu which would be painful. Also, python libraries also have the huge benefit of great data science libraries which make it much more convenient to preprocess data for training networks in python vs cpp.

Additionally there are ways to deploy python models to cpp so there’s not much benefit in training with a cpp library.

7

x11ry0 OP t1_jefvk0p wrote

Hi, thanks a lot for this very detailed answer. I will have a look at your suggestions. Yes we are looking into the mathematical equations too. We fear that it will be quite un-precise but in other hand I understand very well the issue with the small dataset currently available. We may end up exploring different solutions.

1

-pkomlytyrg t1_jef4weq wrote

Generally, yes. If you use a model with a long context length (BigBird or OpenAI’s ada02), you’ll likely be fine unless the articles you’re embedding are greater than the token limit. If your using BERT or another, smaller model, you have to chunk/average; that can produce fixed sized vectors but you gotta put the work in haha

1