Submitted by jiamengial t3_10p66zc in MachineLearning
psma t1_j6j51oq wrote
Streaming inference support. Deploying ML models to work in real-time with little latency is a pain.
jiamengial OP t1_j6j95fq wrote
Presumably this would be for through certain protocols like Websockets and WebRTC? Or more like direct integration to Zoom?
psma t1_j6jhwdl wrote
Not sure how. If I have, e.g. a PyTorch model how do I deploy it for streaming data without having to rewrite it in another framework? (e.g. stateful convolutions, ability to receive an arbitrary number of samples as input, etc). It's doable, but mostly amounts to rewriting your model. This should be automated.
babua t1_j6khgfr wrote
I don't think it stops there either, streaming architecture probably breaks core assumptions of some speech models. e.g. for STT, when do you "try" to infer the word? for TTS, how do you intonate the sentence correctly if you don't know the second half? You'd have to re-train your entire model for the streaming case and create new data augmentations -- plus you'll probably sacrifice some performance even in the best case because your model simply has to deal with more uncertainty.
jiamengial OP t1_j6mbv97 wrote
That's a good point - CTC and attention mechanisms work on the basis that you've got the whole segment of audio
Viewing a single comment thread. View all comments