nmfisher
nmfisher t1_jdyeyit wrote
Reply to [D] FOMO on the rapid pace of LLMs by 00001746
IMO the area most ripe for picking is distilling larger pretrained models into smaller, task-specific ones. Think extracting a 30mb LM from Lllama that is limited to financial terminology.
There's still a huge amount of untapped potential.
nmfisher t1_j7pawou wrote
Reply to comment by gunshoes in [D] Which is the fastest and lightweight ultra realistic TTS for real-time voice cloning? by akshaysri0001
That's why I added the qualifier "good" :)
nmfisher t1_j7osgdc wrote
Reply to comment by gunshoes in [D] Which is the fastest and lightweight ultra realistic TTS for real-time voice cloning? by akshaysri0001
FS2 is fine for training a TTS model from scratch, but I haven't come across a good FS2 model for cloning (which is basically zero-shot TTS).
nmfisher t1_j67czbb wrote
Reply to [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132
Those models so exist (just search for "ASR seq2seq"), it's just that CTC has always been faster/more stable/more effective method for training since it avoids the need to learn specific alignments between input features and output "units" (phonemes/subwords/letters/whatever).
The view was that encoder/decoder modesl needed considerably more training data/longer training times, and usually underperformed. However, I just came across https://arxiv.org/pdf/2205.01086.pdf which found a method for fine-tuning a pre-trained seq2seq encoder that actually outperformed CTC in on small datasets, so that may no longer be the case.
nmfisher t1_j675n9m wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
Have you tried applying for the Google TPU Research program?
nmfisher t1_j62y29r wrote
Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Slight tangent - has anyone ever tried "fine-tuning" a large speech recognition model (e.g. Whisper) by feeding a training set and pruning activations? The idea being that only a subset of weights/activations are necessary for a given speaker/dataset, so you can compress a larger model into a smaller one (and then continue retraining conventionally) that performs equally well for a given subset of data. Presumably this would require some degree of sparsity to begin with?
nmfisher t1_j4typw0 wrote
Reply to comment by Impressive_Iron_6102 in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
If I was using one of the newer search engines that let you block domains then Medium would definitely be on my blacklist. The signal-to-noise ratio is just way too low.
towardsdatascience might be slightly better but even if you find something worthwhile, it's probably available somewhere else that doesn't clog up your search results.
nmfisher t1_j4paqfc wrote
Reply to comment by Professional-Row9655 in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
Easiest way IMO is to scan the list of papers at the annual conferences in your given field, pick a handful with names that sound interesting, then try and find a paper that's referenced by two or more of them.
That's probably a good place to start - it's been around long enough that it's probably not a flash in the pan, but still "new" enough to be relevant.
nmfisher t1_j4odkrt wrote
Reply to comment by __lawless in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
- Choose your niche (speech recognition/image classification/LLMs/whatever)
- Start your own blog with good* technical content (i.e. not the shovel crap you see on Medium), and see if you can write some guest posts for an existing blog with decent traffic. Open-source your code on GH. Spread on social media.
- Give presentations at a few local events and make it clear you're also available for freelancing.
It might take a month or two but people will start contacting you.
* this is important, your blog content/presentation actually has to be worth reading. It doesn't have to be cutting-edge, but it has to be novel enough to convince someone that you have something special to offer. Implementing a lesser-known paper and showing your results is usually a good start (also it teaches you just how hard it is to recreate something based on a paper alone).
nmfisher t1_j3l4ipq wrote
Reply to comment by suflaj in [D] Have you ever used Knowledge Distillation in practice? by fredlafrite
Echoing this, KD is also very useful for taking a heavyweight GPU model and training a student model that's light enough to run on mobile. Small sacrifice in quality for huge performance gains.
nmfisher t1_j2m18rs wrote
Reply to [D] Machine Learning Illustrations by fdis_
Came for illustrations generated by an ML model, left disappointed.
In all seriousness, these are quite good. Kudos :)
nmfisher t1_jeco3nx wrote
Reply to comment by Business-Lead2679 in [D] Training a 65b LLaMA model by Business-Lead2679
Someone also mentioned https://jarvislabs.ai/ to me the other day, haven't used it myself but it looks promising.