hp2304

hp2304 t1_iy8lyyf wrote

Any ML classifier or regressor is basically a function approximator.

The function space isn't continuous but rather discrete, discretized by the dataset points. Hence, increasing the size of dataset can help in increasing overall accuracy. This is relatable with Nyquist criterion. Less data and its more likely our approximation is wrong. Given the dimensions of input space and range of each input variable, the dataset size is nothing. E.g. for 224x224 rgb input image, input space has total pow(256, 224x224x3) possible input values, which is unimaginably large number, mapping each to a correct class label (total 1000 classes) is very difficult for any approximator. Hence, one can never get 100% accuracy.

2

hp2304 t1_iu3ixav wrote

Inference: If real time is a requirement then it's necessary to buy high end GPUs to reduce latency other than that it's not worth it.

Training: This loosely depends on how often is a model reiterated in production. Suppose if that period is one year (seems reasonable to me), which means new model will be trained on new data gathered over this duration plus old data. Doing this fast won't make a difference. I would rather use slow GPU even if take days or few weeks. It's not worth it.

A problem to DL models in general is they are only growing in terms of number of parameters. Requiring more VRAM to fit them in single GPU. Huge thanks to model parallelism techniques and ZERO which handles this issue. Otherwise one would have to buy new hardware to train large models. I don't like where AI research is headed. Increasing parameters is not an efficient solution, we need new direction to effectively and practically solve general intelligence. On top of that, models not detecting or misdetecting objects in self driving cars despite huge training datasets is a serious red flag showing we are still far from solving AGI.

3