Submitted by nullspace1729 t3_y0dk5c in MachineLearning

I’m looking for a recent (last 5 years) paper that introduces a new e.g. objective function, optimiser or model etc that I can try to implement myself in python/torch/keras. I mainly want to do this to learn new ideas and improve my coding skills.

Do you have any recommendations or alternatively any advice for how to find new interesting papers for someone who isn’t a researcher? I’ve looked on arxiv but I haven’t found what I’m looking for.

31

Comments

You must log in or register to comment.

Small-Reason-8096 t1_irr6g0q wrote

Hands down the best paper I have ever read (and reimplemented) is the ResNets paper:

https://arxiv.org/abs/1512.03385

The descriptions are clear and concise - but with enough detail to reimplement in whatever framework you like. Also, OOTB the results I got on CIFAR10 matched the paper pretty much perfectly (not always a given!).

Another good paper to try is AWD-LSTM: https://arxiv.org/pdf/1708.02182.pdf

Basically, if you are implementing and training from scratch, focus on something you can train with a smallish dataset in a reasonable period of time. I would generally steer away from LLMs and object detection / segmentation models as they require more resources to train that are commonly available!

22

dasayan05 t1_irrfd7w wrote

It doesn't matter which one you implement. Trying to implement anything from scratch always exposes you to deeper insights which is hard to get by looking at dry mathematics on paper. Just one advice: pick a paper/algo that is well-known to work and reproducible. Then you are good.

6

Der-Schwarzer-Schwan t1_irrvs7r wrote

Try NeRF paper. It will show you how to combine a network with an analytical function.

12

TheInfelicitousDandy t1_irsfw1a wrote

I've tried to reimplement AWD-LSTM in pytorch > 1. and have never been able to get close to the original results. I've also seen other people try and not get close. Pretty sure it has to do with the weight dropout they used.

If anyone knows of any pytorch > 1. version that achieves the same PPL on PTB/Wiki02 I'd very much like to know.

3

pm_me_your_ensembles t1_irt5y8h wrote

Phil Wang/lucidrains has phenomenal implementations of stuff, I'd recommend checking them out and reading their code.

Furthermore, I'd recommend simply reading more code and tackling complex problems, e.g. try building a DL framework from "scratch" ontop of jax. Read the Haiku codebase, and compare it to say Equinox (I am a big fan of this one). Go through the huggingface code bases, e.g. transformers. Choose a model and build it from scratch and make it compatible with their API.

7

neuroguy123 t1_irtuo1b wrote

I recommend some of the YOLO versions. I had fun with those and learned a lot about complex loss functions.

I also implemented a bunch of attention-models starting with Graves', through Bahdanau and Luong, and then Transformers. The history of attention in deep learning is very interesting and instructional to implement.

Another one I had fun implementing was Wavenet as it really forces you to deep dive on convolution variations, pixel-cnn, and some gated network structures. Then conditioning it was an extra challenge (similar to the attention networks).

One thing I've been meaning to get into is deepcut and other pose models because I don't know much about linear programming and the other math they use in those.

2