oswinso t1_j30ip05 wrote on January 5, 2023 at 5:20 AM

Besides the computational speed advantage mentioned by OP and other human-related aspects such as attention span / human fatigue, I think there's a bit more nuance in the answer depending on what "generated by humans" means.

Take a binary image classification example, where the task is to classify whether an image is a dog or not a dog. Here, the labels are "generated by a human" by looking at the same input that the algorithm receives. In this case, assuming all labels are correct, I would argue that machine learning is unable to achieve a higher accuracy than the human labeler ignoring the previous factors mentioned, since the "correctness" of the classification was defined by the human itself. If the human was in peak condition and without time constraints, the human labeler should be able to achieve 100% accuracy all the time.

On the other hand, suppose the task is to perform time-series prediction, where the label is obtained from the future. Even though the dataset was collected by a human, the "generation process" is not from a human labeling the data but rather by some other process. In this case, machine learning has the potential to outperform humans.