Tall_Help_1925 t1_j5xl2i4 wrote on January 26, 2023 at 6:38 AM

It depends on how similar the images are and on the amount of defect data, i.e. images containing cracks. You can rephrase it as image segmentation problem and use U-nets (without attention) as model. Due to the limited receptive fields of the individual "neurons" the dataset will effectively be much larger. If the input data is already aligned you could also try using the difference in feature space, i.e. calculate the difference in the activations of a pretrained network for non-defective images and the current image. I'd suggest using cosine distance.