Submitted by hundley10 t3_10dqwqb in MachineLearning

What model structure would be recommended for detecting the coordinates of all 4 corners of a rectangle (e.g. index cards)? Most object detection models like YOLO produce rectangular bounding boxes; what tweaks can be made to trace the object regardless of orientation?

For my specific problem, classical edge/corner detectors aren't a good fit - so I'm falling back on ML. Currently have a dataset of about 1500 domain-specific labeled images; hoping to train a model on TF. Thanks for the suggestions!

Edit: here are a few examples from my dataset. The green dots aren't part of the images; they just show how the corners are annotated:

https://preview.redd.it/2f8uimhn7hca1.jpg?width=1373&format=pjpg&auto=webp&v=enabled&s=5086c73084aa15014825f45e20ab1532c743078b

https://preview.redd.it/ujb8tmhn7hca1.jpg?width=3024&format=pjpg&auto=webp&v=enabled&s=9a3b5bfba3decd797546b5c77d0889568ea7b9e7

https://preview.redd.it/9lzgfmhn7hca1.jpg?width=3024&format=pjpg&auto=webp&v=enabled&s=b6186db7e679c3666056cb198269937134b86d37

12

Comments

You must log in or register to comment.

JiraSuxx2 t1_j4muqql wrote

I’m not a 100% sure how yolo works but I think images are cut into grids and then detection is done per grid square. The results are processed, the bounding boxes are computed the old fashioned way from the predictions. That’s also how they get multiple predictions per image I think.

In your case, even if you detect corners how do you know they belong to the same card?

0

PredictorX1 t1_j4mv82b wrote

Assuming that you are trying to locate all corners of a rectangle in a raster image, I suggest researching corner detection in image processing.

11

hundley10 OP t1_j4mvwoj wrote

For this particular problem, the image could contain many corners - and even other full rectangles. The goal is to detect the specific type of paper card I'm interested in - easily identifiable based on color/pattern - but not easily extracted from a Sobel filter.

1

bubudumbdumb t1_j4n08pi wrote

So basically you are printing the cards? Or you have a jpg of the cards or you can scan them?

If yes then what you can do is apply SIFT or even faster ORB to the pictures of the cards to detect and describe the salient points. Build a nearest neighbors index of the key point feature space.

(Optionally) Then you can scale the coordinates of the key points to match the intended dimensions in centimeters (or inches of that's your favorite)

Then you can perform the same with the images from your camera. Get run the key points you detect from the camera through the nn index to match each to the most similar key point from the cards. You are going to get a lot of false positives but don't worry : you can use a ransac approach to filter the matches that don't result in a consistent geometry.

The ransac procedure will return a calibrated fundamental matrix that you can use to project the rectangle of the card to the image space captured by the camera.

All the algorithms I mentioned are available in opencv (also the nn index but I dislike that since there are more modern alternatives). Also there are tutorials on how to use and visualize this stuff.

If this is geometrical gibberish to you check out the ORB paper. Figure 1, 9 and 12 should confirm whether this is the kind of matching you are looking for.

https://scholar.google.com/scholar?q=ORB:+An+efficient+alternative+to+SIFT+or+SURF&hl=en&as_sdt=0&as_vis=1&oi=scholart#d=gs_qabs&t=1673904812693&u=%23p%3DWG1iNbDq0boJ

6

robobub t1_j4n3gcm wrote

A couple options off the top of my head

  • Add orientation prediction to the bounding box
  • Add keypoints for the 4 actual corners as a prediction
  • Postprocess boxes with classical techniques, looking for the outermost corners that fit certain properties
  • Do everything classically, and deal with the difficulties you have mentioned in your comment.

The first two require annotations of attributes for each box, and will be predicted directly by the model. Though note that you don't have to do this for every label, you can just not train parts of the model when certain attributes are unlabeled.

Both will require some care in modeling, e.g. orientation can have a loss condition at 360 degrees that you'll want to handle, and regressing keypoints can be done well and not well, reference how corners are modeled. And then of course you'll need to postprocess the model's outputs to align/visualize on an image.

1

hundley10 OP t1_j4n3tz2 wrote

Thanks for the suggestions. I edited my post to give some examples of the detection that needs to be performed... notice how sometimes corners can be obscured, and the background can make "simple" rectangle detection a poor fit. I will check out ORB though.

1

bubudumbdumb t1_j4n54nk wrote

The Key here is that by detecting key points you don't need to detect the corners per se : you detect at least a dozen points from the pattern on the card then assuming the card is a rectangle on a plane you can identify the corners.

In other words this can be very robust to occlusions, like you might not see more than half of the card and still be able to identify where the corners are

5

hundley10 OP t1_j4n5p0y wrote

Edited post with some example pics. I've been leaning toward #3 if I can't find a better solution, but can you provide more info about #2? My labels are the (x,y) coordinates of each corner of the cards.

1

dandandanftw t1_j4nf5i3 wrote

A corner detector then depending on how many corner you got, you can brute force any possible rectangle. You can also use hough line detection to limit number of corners. You can also use a simple model like SVM to compare the corners and patterns of the given images. You should also check out glcm for preprocessing the pattern

4

TedRabbit t1_j4okgzh wrote

I mean, seems like a basic convolution neural network would work well for this.

0

Lethandralis t1_j4p15c3 wrote

This might not work if the cards have 6 degrees of freedom. You can check out CornerNet and its variants for anchor free corner estimation. The original paper detects two corners, but extending to four should be possible.

Another option is to use yolo to detect a rough bbox, and then use classical cv to refine corner locations.

2