PassionatePossum t1_j2xnqus wrote on January 4, 2023 at 5:51 PM

It absolutely can, but it depends on the task.

When it comes to questions of human estimation, there is the Wisdom of the Crowd effect where individual people can either over- or underestimate a value, but the average of many predictions often is extremely accurate. You can either train a system directly on the average of many human predictions or you only have a single prediction for each sample and during training the errors tend to average out.

And even for tasks where there is a definite ground truth available that doesn't depend on human estimation it can happen. For example in many medical applications the diagnosis is provided by the lab, but physicians often need to make a decision whether it is worth it to take a biopsy and send to the lab. In this case they often are taught heuristics: A few features by which they decide whether to have a sample analyzed.

If you take samples and use the groundtruth provided by the lab, it is not inconceivable that a deep learning based classifier can discover additional features that are not used by physicians that lead to better predictions. Obviously, it won't get better than the lab results.

As you migth have guessed, I work in the medical field. And we have done the evaluation for one of our products: It is not able to outperform the top-of-the-line experts in the field. But it is easily able to outperform the average physician.

blueSGL t1_j2z28ow wrote on January 4, 2023 at 11:00 PM

> Wisdom of the Crowd

Something I recently saw mentioned by Ajeya Cotra is to query the LLM by re entering the previous output and asking if its correct, repeat this multiple times, take an average the answers provides a higher level of accuracy than just taking the first answer. (something that sounds weird to me)

Well ok, if viewed from the vantage point that the models are very good at doing certain things and people have not worked out how to correctly prompt/fine tune yet, it's not that weird. It's more that the base level outputs are shockingly good and then someone introduces more secret sauce and makes them even better. The problem with this is there is no saying what the limit to the models that already exist are.

niclas_wue t1_j340rkv wrote on January 5, 2023 at 10:09 PM

I think you make an important point. In the end it comes down to the distribution of the data, which always consists of signal and noise. The signal will always be similar, however the noise is random as people can be wrong in all sorts of ways.