Awekonti

Awekonti t1_j0y85fy wrote

They deploy the models with well established pipelines. Usually (from my experience), scientists communicate closely with engineers (ML, Data Engineers or DevOps). They "put" the model into the environment where it can operate. It can be either the web service where u simply wrap the model, or container deployments (see Docker). They don't retrain the models every-time the app launches as they simply save the models parameters in that environment or CI/CD platform. I have a little experience in deploying and maintaining the production models, but I have no clue about the details tho.

3

Awekonti t1_j0y7esq wrote

>Is quantization ultimately a kind of scaling

Not really, it is about approximating (or better to say mapping) of real-world values that brings the limits. So that the model shrinks - computations and other model operations are being executed at lower bit-width(s).

2