Submitted by alik31239 t3_117blae in MachineLearning
In the abstract of the Nerf paper (https://arxiv.org/abs/2003.08934), the described framework is that Nerf enable to do the following: the user inputs a set of images with known camera poses, and after training the network they can generate images of the same scene from new angles.
However, the paper itself builds a network that gets as an input 5D vectors (3 location coordinates+2 camera angles) and outputs color and volume density for each such coordinate. I don't understand where do I get those 5D coordinates from? My training data surely doesn't have those - I only have a collection of images. Same for inference data. It seems that the paper assumes not only having a collection of images but also having a 3D representation of the scene, while the abstract doesn't require the latter. What am I missing here?
CatalyzeX_code_bot t1_j9avn4h wrote
Found relevant code at https://github.com/yenchenlin/nerf-pytorch + all code implementations here
--
To opt out from receiving code links, DM me