nmfisher

nmfisher t1_jdyeyit wrote

IMO the area most ripe for picking is distilling larger pretrained models into smaller, task-specific ones. Think extracting a 30mb LM from Lllama that is limited to financial terminology.

There's still a huge amount of untapped potential.

40

nmfisher t1_j67czbb wrote

Those models so exist (just search for "ASR seq2seq"), it's just that CTC has always been faster/more stable/more effective method for training since it avoids the need to learn specific alignments between input features and output "units" (phonemes/subwords/letters/whatever).

The view was that encoder/decoder modesl needed considerably more training data/longer training times, and usually underperformed. However, I just came across https://arxiv.org/pdf/2205.01086.pdf which found a method for fine-tuning a pre-trained seq2seq encoder that actually outperformed CTC in on small datasets, so that may no longer be the case.

1

nmfisher t1_j62y29r wrote

Slight tangent - has anyone ever tried "fine-tuning" a large speech recognition model (e.g. Whisper) by feeding a training set and pruning activations? The idea being that only a subset of weights/activations are necessary for a given speaker/dataset, so you can compress a larger model into a smaller one (and then continue retraining conventionally) that performs equally well for a given subset of data. Presumably this would require some degree of sparsity to begin with?

29

nmfisher t1_j4typw0 wrote

If I was using one of the newer search engines that let you block domains then Medium would definitely be on my blacklist. The signal-to-noise ratio is just way too low.

towardsdatascience might be slightly better but even if you find something worthwhile, it's probably available somewhere else that doesn't clog up your search results.

1

nmfisher t1_j4paqfc wrote

Easiest way IMO is to scan the list of papers at the annual conferences in your given field, pick a handful with names that sound interesting, then try and find a paper that's referenced by two or more of them.

That's probably a good place to start - it's been around long enough that it's probably not a flash in the pan, but still "new" enough to be relevant.

2

nmfisher t1_j4odkrt wrote

  1. Choose your niche (speech recognition/image classification/LLMs/whatever)
  2. Start your own blog with good* technical content (i.e. not the shovel crap you see on Medium), and see if you can write some guest posts for an existing blog with decent traffic. Open-source your code on GH. Spread on social media.
  3. Give presentations at a few local events and make it clear you're also available for freelancing.

It might take a month or two but people will start contacting you.

* this is important, your blog content/presentation actually has to be worth reading. It doesn't have to be cutting-edge, but it has to be novel enough to convince someone that you have something special to offer. Implementing a lesser-known paper and showing your results is usually a good start (also it teaches you just how hard it is to recreate something based on a paper alone).

19