Submitted by tmblweeds t3_zn0juq in MachineLearning
memberjan6 t1_j0g470y wrote
Gpt3 is always at some risk of hallucinating its responses, due to its architecture. Your steps to prevent the hallucinations in the medical application are steps in the right direction, and may turn out to be helpful guidance for other developers of applications. Your steps toward traceability of the model's answers are also wise moves.
But bY contrast the Deepset.ai Haystack pipeline QA framework and perhaps others is designed to exactly execute Nonhallucination as well as answer provenance transparency. In the medical context, I think you'd need to demonstrate some empirical evaluations on both types of systems, to medical stakeholders, after getting some such evidence privately for your own decision as to the better architecture for a medical app.
I can say the slower responses of gpt3 types of LLMs is also a potential challenge. By contrast the Haystack design uses a combination of two or more model types in a pipeline to dramatically speed up the responses, and show you exactly where it sourced its answer in the document base.
Viewing a single comment thread. View all comments