Submitted by kingdroopa t3_10f7dyr in MachineLearning
new_name_who_dis_ t1_j4v5bet wrote
Architecturally probably some form of unet is best. It’s the architecture of choice for things like segmentation so I imagine it would be good for IR as well
kingdroopa OP t1_j4v5o38 wrote
Could you recommend any SOTA models using U-NET?
Anjum48 t1_j4v8mpm wrote
+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:
model = sm.Unet('resnet34', classes=1, activation='sigmoid')
Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models
Many of the most popular encoders/backbones are implemented in that package
Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler
kingdroopa OP t1_j4vafrc wrote
Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?
Anjum48 t1_j4vc9kp wrote
The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.
People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused
kingdroopa OP t1_j4vh0sq wrote
Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)
Viewing a single comment thread. View all comments