gradientpenalty

gradientpenalty t1_j8ruzh9 wrote

Maybe you don't do much NLP research then? Back when huggingface transformers and datasets library ( still think its bad name ), we had to format these validation ourselves and write the same validation code which hundreds of your peers have written before because no one is the defactor code for doing it (since we are using different kinds of model). NLP models ( or so called transformers ) nowadays are a mess and had no fix way to use them, running benchmark is certainly a nightmare.

When transformers first came out, they are limited but serves to simplify using bert embedding and gpt-2 beam search generation in few line of codes. The library will do all the model downloads, version check and abstraction for you. Then there's datasets, which unifies all NLP datasets in a central platform which allows me to run GLUE benchmark in one single py file.

Oh back then, the code was even worse, all modeling_(name).py under the transformers/ directory. The latest 4.2X version its somewhat maintainable and readable with all the complex abstraction they had. But its a fast moving domain, and any contribution will be irrelevant in a few years later, so complexity and mess will add up ( would you like to spend time doing cleaning instead of implement the new flashy self-attention alternative? ).

But one day, they might sell out as with many for profit company, but they have and had save so many time and helped so many researchers on the advancement of NLP progress. If they manage to piss off the community, someone will rise up and challenge their dominance (tensorflow vs pytorch).

38

gradientpenalty t1_j2pfy52 wrote

I try to get to sleep, but I can't
cause each time I woke up, the loss went to nan

I try to hang out with friends, but
every 5 minutes I kept refreshing weights and biases and my friends never hang out with me anymore

I try to play online games, but
each time the training went into OOM, I just close the game and try to tune my hyperparameters.

Now I just pray anxiously while scrolling online store for second hand 3090. And I look more or less like Gollum

4

gradientpenalty t1_j21o2qt wrote

This is abit irrelevant to your topic but I don't think the weights are anything useful at all. The core value or IP which you need to protect is how you obtain the weight itself. This includes the proprietary training data and training pipeline. If you model can be trained from a huggingface transformer model and its training using transformers pipeline + dataset from the hub, I don't think its all that "intellectual" at all.Its like Windows 7 or 8 where pirate software cannot be avoided. But the value of microsoft is the engineers and experience of developing an operating system in billions of different hardware that will allow them to develop newer version of Windows which is the most valuable.

3