Comments
naccib t1_iy1009o wrote
Monocular depth estimation is very valuable for creating AR experiences in general-use devices such as smartphones. This is, in my opinion, the greatest value for such depth estimation algorithms.
carbocation t1_iy12bhd wrote
I agree with you about the value and use-cases for monocular depth estimation. I was just making the point that, in principle, a binocular device could attempt binocular depth estimation. Or perhaps they tried it internally and it was not sufficiently better to be worth the expense.
naccib t1_iy3jayh wrote
Oh, binocular depth estimation is definitely a less technically challenging approach. I think the reasons they are pursuing monocular are due to what the other commenter said about cost and stuff.
pm_me_your_pay_slips t1_ixz544s wrote
One camera is cheaper than two, though. Cheaper in every sense (compute, memory, network bandwidth, energy consumption, parts cost, etc).
mg31415 t1_iy2pg4i wrote
How is one camera is cheaper computationally? If it was stereo they wouldn't need a NN
pm_me_your_pay_slips t1_iy2twej wrote
You need to do feature computation and find correspondences. If you’re using a learned feature extractor, that will be twice as expensive as the monocular model. But let’s say you’re using a classical feature extractor. You still need to do feature matching. For dense depth maps, both of these stages can be as expensive, if not more, than a single forward pass through a highly optimized mobile NN architecture.
soulslicer0 t1_iy2fd1x wrote
Could be doing depth estimation by fusing two monocular nets like mvsnet
Zeraphil t1_ixz5ntr wrote
Slight peeve, that’s not being processed onboard the glasses but on the separate compute box, a Moto phone. Still nice but you can put heavier hitting compute when on that setup while keeping the glasses lightweight
pm_me_your_pay_slips t1_ixzf29m wrote
You can put that compute on the glasses. The real problem is heat dissipation. It is what killed google glass.
lennarn t1_iy0rom4 wrote
Just put the compute inside a cute little hat
Zeraphil t1_ixzf8u5 wrote
Sure, HoloLens has plenty and it’s all on the glasses as well. But at the cost of weight and comfort.
lennarn t1_iy0s0sg wrote
Weight and comfort are essential for a product like this, if it was indiscernible from a pair of sunglasses everyone would get one
SpatialComputing OP t1_ixzqmh5 wrote
Yes. On the other hand: in the glasses is a Snapdragon XR1 Gen 1 and if that's a Motorola Edge+, there's a SD 865 in there... both not the most efficient SoCs today. Hopefully QC can run this on the Snapdragon AR2 in the future.
donobinladin t1_ixzwa2l wrote
Wonder how much bandwidth is needed and if it could be compressed to Bluetooth
Zeraphil t1_ixzwpm4 wrote
Can’t say too much, but it’s in the works.
Source: I was on the team that designed the original compute box design, at Lenovo.
donobinladin t1_ixzyiri wrote
This is really cool tech, great work!
Would be an interesting use case for lifi since conference rooms or desk space is always well lit.
Would require some infra but if it were only in certain areas the overhead probably wouldn’t be much to realize 1.5 gbps+ throughput
You can just Venmo me cash if you use the idea 😉
Deep-Station-1746 t1_iy053b2 wrote
ELI5 why strap it on your face?
extracoffeeplease t1_iy0pkp7 wrote
Yeah they totally didn't show the application?? People've been doing 3d mesh construction with deep learning for a while now
JanFlato t1_iy2b3db wrote
Practically, help blind and visually impaired people. But commercially? Probably just post ads everywhere when VR glasses get more established
ixpu t1_ixyz005 wrote
Links to publication, press release etc?
Chuyito t1_ixz8ify wrote
It looks like the update to https://www.qualcomm.com/news/onq/2022/07/enabling-machines-to-efficiently-perceive-the-world-in-3d , In July they were doing similar depth estimation.
> Depth estimation and 3D reconstruction is the perception task of creating 3D models of scenes and objects from 2D images. Our research leverages input configurations including a single image, stereo images, and 3D point clouds. We’ve developed SOTA supervised and self-supervised learning methods for monocular and stereo images with transformer models that are not only highly efficient but also very accurate. Beyond the model architecture, our full-stack optimization includes using neural architecture search...
That press article and the DONNA page keep it mostly high level / architecture though
josefwells t1_iy0ic6y wrote
As a Qualcomm employee, I can confirm this is what our conference rooms look like.
Lonely_Tuner t1_iy24zue wrote
I conceived this as AI Photogrammetry as a 3d Modeller. And Why should I watch the loading screen for my office?
I_LOVE_SOURCES t1_ixzo13o wrote
I wonder why they weren’t walking around the room
the320x200 t1_iy3r3tw wrote
They walked all around the table... The reconstructed view showed the geometry from a fixed point but the depth and camera view showed they we're walking around.
OverLemonsRootbeer t1_ixzz5js wrote
This is amazing, it can make gaming and AR environment work so much easier.
[deleted] t1_iy2v3vm wrote
This is (one of the many reasons) why I won't buy a Meta Quest. Inside-out tracking relies on this sort of thing, constantly scanning your house and building a model of it. Outside-in tracking, which is far more accurate, doesn't use cameras at all but a swept timing laser and basic photodiodes.
the320x200 t1_iy3qtmw wrote
That works if you only want to do VR, but if you want to do useful AR/MR you need to map the environment one way or another.
SpatialComputing OP t1_iy4wchg wrote
And Quest Pro doesn't even do 3D reconstruction.
_damak0s_ t1_iy01m15 wrote
we tried this a decade ago and it did not work.
pennomi t1_iy11bxn wrote
That’s a terrible argument for compute-heavy technology. Our devices are far better at this today.
_damak0s_ t1_iy11gaf wrote
not what i meant. we don't have enough awareness to be able to check our 3d surroundings while reading a phone-like hud
pennomi t1_iy11r4s wrote
Well you wouldn’t need realtime depth estimation for a HUD would you?
This would be more of a 6-DOF AR system, which can and does have real world applications.
carbocation t1_ixyz3g4 wrote
Seems like binocular depth estimation should be possible with a binocular device.