ObjectManagerManager t1_ivwf1f0 wrote on November 11, 2022 at 2:38 AM

Reply to comment by LurkAroundLurkAround in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen

Alternatively, feed the data source as an output. i.e., have your model output two values. For data sourced from dataset A, minimize loss against the first output. For data sourced from dataset B, minimize loss against the second output.

I don't remember who, but someone wrote a thesis on how it often works better in practice to incorporate additional / auxiliary information in the form of outputs rather than inputs. It's also a very clean solution since you can usually just remove the unnecessary output heads after training, which might decrease your model size for inference (albeit a small amount, unless you have a lot of auxiliary information).