choHZ
choHZ t1_j1t7mk2 wrote
Reply to [D] SE for machine learning reaserch by sad_potato00
I believe no one would argue against applying good engineering principles to any project. But for the context of academia ML research, reproducibility is the #1 reason for being open-sourced in the first place. People really don't care if your logger or plots are beautiful or reusable; they just want your code to a) run with minimum yak shaving and b) have clear instructions of what things are — yet that's already hard enough for many repos I have came crossed :(
For researchers that are doing the development, yes, it'd be nice to have some reusable/expandable modules. Most people I know have the same utils.py as the base template for similar projects, well-maintained open-sourced projects often have nice modules that you can borrow, there are also many well-built experiment trackers that you can make use of. If you got more routine stuff, services like AWS SageMaker are out there. It is just people don't care much about them when it comes to (most) research sharing, they are just more dependencies to track. If you are in a more production-focused role that'd be a different story.
choHZ t1_j7hnrtc wrote
Reply to [D] Yann Lecun seems to be very petty against ChatGPT by supersoldierboy94
I get that he is annoyed that people believe ChatGPT is such a milestone breakthrough unique to OpenAI. It is not, since most big players already have or capable of having LLM tuned to similar capabilities. Yet from the InstructGPT paper, the way they label their data is nothing that any big players can't handle. I also get that he is pissed when people praise OpenAI for its "openness" — OpenAI is absolutely not a fan for the whole open source movement, though maybe reasonably so.
My question is why don't the big players give their bots similar exposure? I find it hard to believe that ethics and some internet critics to be the only reasons.