kakhaev t1_isjpe62 wrote on October 16, 2022 at 2:39 PM

Reply to comment by hellrail in [P] I built densify, a data augmentation and visualization tool for point clouds by jsonathan

Your first point seems reasonable but not obvious for me, I would be convinced if model trained with augmented point clouds will perform better then one without it.

And not like we use all points in our model. For example for object detection from lidar you need a way to make their number variable, because in each iteration you will get different number of points from senior, of course you can do preprocessing, but I hope you got the point.

Usually augmentation allow you to increase sample of your input/output space that will lead to better map function that your model will learn.

I also have problem with that interpolation that OP uses is linear, but no one stopping you from modifying code yourself if necessary.

VaporSprite t1_isk3ryn wrote on October 16, 2022 at 4:16 PM

Correct me if I'm wrong, I'm far from an expert, but couldn't training a model with more data which doesn't inherently add information potentially lead to overfitting?

hellrail t1_iskgwjz wrote on October 16, 2022 at 5:41 PM

No, why should it.

This densification can make it easier to reach a generalizing training state, but the generalized state probably performs worse than a well generalized state without the augmentation as it changes the distribution to learn slightly by artificially imposing that a portion of the points are the center of mass of a triangulation of another portion of points. That is not generally the case for sensor data that will come in, therefore the modified distribution has low relevance to the real distribution that one wants to learn.

hellrail t1_iskhso6 wrote on October 16, 2022 at 5:46 PM

@ Usually augmentation allow you to increase sample of your input/output space that will lead to better map function that your model will learn.

More data better results in general yes, but if the additional data is worthless, its a bit scam. That will be recognized in a comparison with an equally well trained state without that augmentation (might be harder to reach) tested on relevant data.

Technically put: the learned distribution is altered to a surrogate pointcloud which is quite similar to the relevant distribution of sensor data that will be produced measuring the real world, but is not the same anymore. Thats the price for more training data with this, and i wouldnt pay it because my primary goal is to capture the relevant distribution as Close as possible.