gradientpenalty t1_j8ruzh9 wrote on February 16, 2023 at 2:45 PM

Reply to [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee

Maybe you don't do much NLP research then? Back when huggingface transformers and datasets library ( still think its bad name ), we had to format these validation ourselves and write the same validation code which hundreds of your peers have written before because no one is the defactor code for doing it (since we are using different kinds of model). NLP models ( or so called transformers ) nowadays are a mess and had no fix way to use them, running benchmark is certainly a nightmare.

When transformers first came out, they are limited but serves to simplify using bert embedding and gpt-2 beam search generation in few line of codes. The library will do all the model downloads, version check and abstraction for you. Then there's datasets, which unifies all NLP datasets in a central platform which allows me to run GLUE benchmark in one single py file.

Oh back then, the code was even worse, all modeling_(name).py under the transformers/ directory. The latest 4.2X version its somewhat maintainable and readable with all the complex abstraction they had. But its a fast moving domain, and any contribution will be irrelevant in a few years later, so complexity and mess will add up ( would you like to spend time doing cleaning instead of implement the new flashy self-attention alternative? ).

But one day, they might sell out as with many for profit company, but they have and had save so many time and helped so many researchers on the advancement of NLP progress. If they manage to piss off the community, someone will rise up and challenge their dominance (tensorflow vs pytorch).

gradientpenalty t1_j82a9xg wrote on February 11, 2023 at 2:24 AM

Reply to [D] Are there emergent abilities of image models? by These-Assignment-936

denoising diffusion probabilistic models:

Rdiffusion : Generate music from stable diffusion

Improve image segmentation : I remember someone doing image segmentation on these generative model, but not sure where.

gradientpenalty t1_j6278gc wrote on January 27, 2023 at 4:29 AM

Reply to [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5

Its not a problem of unicode but the tokenizer method they are using BPE. I don't forsee any solution in the future cause there aren't many high paying customer

TLDR; english use the least token because it provides the highest compression ratio in bytes to token size.

gradientpenalty t1_j61tko2 wrote on January 27, 2023 at 2:40 AM

Reply to comment by cdsmith in Few questions about scalability of chatGPT [D] by besabestin

Okay, so where can I buy it as a small startup for under 10k without signing any NDA for using your proprietary compiler. As far as I can see, we are all still stuck with Nvidia after 10B of funding for all these "AI" hardware startup.

gradientpenalty t1_j2u9byl wrote on January 4, 2023 at 12:02 AM

Reply to [R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by zveroboy152

Do you have any benchmarks to share? Would be very nice if this is available

gradientpenalty t1_j2pfy52 wrote on January 3, 2023 at 1:03 AM

Reply to [D] What do you do while you wait for training? by hollow_sets

I try to get to sleep, but I can't
cause each time I woke up, the loss went to nan

I try to hang out with friends, but
every 5 minutes I kept refreshing weights and biases and my friends never hang out with me anymore

I try to play online games, but
each time the training went into OOM, I just close the game and try to tune my hyperparameters.

Now I just pray anxiously while scrolling online store for second hand 3090. And I look more or less like Gollum

gradientpenalty t1_j21o2qt wrote on December 29, 2022 at 12:32 AM

Reply to [D] Protecting your model in a place where models are not intellectual property? by nexflatline

This is abit irrelevant to your topic but I don't think the weights are anything useful at all. The core value or IP which you need to protect is how you obtain the weight itself. This includes the proprietary training data and training pipeline. If you model can be trained from a huggingface transformer model and its training using transformers pipeline + dataset from the hub, I don't think its all that "intellectual" at all.Its like Windows 7 or 8 where pirate software cannot be avoided. But the value of microsoft is the engineers and experience of developing an operating system in billions of different hardware that will allow them to develop newer version of Windows which is the most valuable.