farmingvillein t1_j0ifmkt wrote on December 16, 2022 at 9:46 PM

Reply to comment by Own-Plantain8065 in [P] Medical question-answering without hallucinating by tmblweeds

This also would probably be a good way to gather data on where the model may not be working.

If a relatively recent systematic review is giving a different result than a contemporaneous and/or older set of papers, it is probably (would need to verify this empirically) more likely that something is being processed incorrectly.

(Reviews obviously also aren't perfect--but my guess is that you'd find that they are pretty robust indicators of something being off.)