carbocation t1_ixyz3g4 wrote on November 27, 2022 at 2:16 PM

Seems like binocular depth estimation should be possible with a binocular device.

naccib t1_iy1009o wrote on November 27, 2022 at 10:44 PM

Monocular depth estimation is very valuable for creating AR experiences in general-use devices such as smartphones. This is, in my opinion, the greatest value for such depth estimation algorithms.

carbocation t1_iy12bhd wrote on November 27, 2022 at 11:01 PM

I agree with you about the value and use-cases for monocular depth estimation. I was just making the point that, in principle, a binocular device could attempt binocular depth estimation. Or perhaps they tried it internally and it was not sufficiently better to be worth the expense.

naccib t1_iy3jayh wrote on November 28, 2022 at 2:06 PM

Oh, binocular depth estimation is definitely a less technically challenging approach. I think the reasons they are pursuing monocular are due to what the other commenter said about cost and stuff.

pm_me_your_pay_slips t1_ixz544s wrote on November 27, 2022 at 3:06 PM

One camera is cheaper than two, though. Cheaper in every sense (compute, memory, network bandwidth, energy consumption, parts cost, etc).

mg31415 t1_iy2pg4i wrote on November 28, 2022 at 7:55 AM

How is one camera is cheaper computationally? If it was stereo they wouldn't need a NN

pm_me_your_pay_slips t1_iy2twej wrote on November 28, 2022 at 9:00 AM

You need to do feature computation and find correspondences. If you’re using a learned feature extractor, that will be twice as expensive as the monocular model. But let’s say you’re using a classical feature extractor. You still need to do feature matching. For dense depth maps, both of these stages can be as expensive, if not more, than a single forward pass through a highly optimized mobile NN architecture.

soulslicer0 t1_iy2fd1x wrote on November 28, 2022 at 5:47 AM

Could be doing depth estimation by fusing two monocular nets like mvsnet

Zeraphil t1_ixz5ntr wrote on November 27, 2022 at 3:11 PM

Slight peeve, that’s not being processed onboard the glasses but on the separate compute box, a Moto phone. Still nice but you can put heavier hitting compute when on that setup while keeping the glasses lightweight

pm_me_your_pay_slips t1_ixzf29m wrote on November 27, 2022 at 4:21 PM

You can put that compute on the glasses. The real problem is heat dissipation. It is what killed google glass.

lennarn t1_iy0rom4 wrote on November 27, 2022 at 9:45 PM

Just put the compute inside a cute little hat

Zeraphil t1_ixzf8u5 wrote on November 27, 2022 at 4:23 PM

Sure, HoloLens has plenty and it’s all on the glasses as well. But at the cost of weight and comfort.

lennarn t1_iy0s0sg wrote on November 27, 2022 at 9:47 PM

Weight and comfort are essential for a product like this, if it was indiscernible from a pair of sunglasses everyone would get one

SpatialComputing OP t1_ixzqmh5 wrote on November 27, 2022 at 5:42 PM

Yes. On the other hand: in the glasses is a Snapdragon XR1 Gen 1 and if that's a Motorola Edge+, there's a SD 865 in there... both not the most efficient SoCs today. Hopefully QC can run this on the Snapdragon AR2 in the future.

donobinladin t1_ixzwa2l wrote on November 27, 2022 at 6:20 PM

Wonder how much bandwidth is needed and if it could be compressed to Bluetooth

Zeraphil t1_ixzwpm4 wrote on November 27, 2022 at 6:22 PM

Can’t say too much, but it’s in the works.

Source: I was on the team that designed the original compute box design, at Lenovo.

donobinladin t1_ixzyiri wrote on November 27, 2022 at 6:34 PM

This is really cool tech, great work!

Would be an interesting use case for lifi since conference rooms or desk space is always well lit.

Would require some infra but if it were only in certain areas the overhead probably wouldn’t be much to realize 1.5 gbps+ throughput

You can just Venmo me cash if you use the idea 😉

Deep-Station-1746 t1_iy053b2 wrote on November 27, 2022 at 7:17 PM

ELI5 why strap it on your face?

extracoffeeplease t1_iy0pkp7 wrote on November 27, 2022 at 9:31 PM

Yeah they totally didn't show the application?? People've been doing 3d mesh construction with deep learning for a while now

JanFlato t1_iy2b3db wrote on November 28, 2022 at 5:02 AM

Practically, help blind and visually impaired people. But commercially? Probably just post ads everywhere when VR glasses get more established

ixpu t1_ixyz005 wrote on November 27, 2022 at 2:16 PM

Links to publication, press release etc?

Chuyito t1_ixz8ify wrote on November 27, 2022 at 3:33 PM

It looks like the update to https://www.qualcomm.com/news/onq/2022/07/enabling-machines-to-efficiently-perceive-the-world-in-3d , In July they were doing similar depth estimation.

> Depth estimation and 3D reconstruction is the perception task of creating 3D models of scenes and objects from 2D images. Our research leverages input configurations including a single image, stereo images, and 3D point clouds. We’ve developed SOTA supervised and self-supervised learning methods for monocular and stereo images with transformer models that are not only highly efficient but also very accurate. Beyond the model architecture, our full-stack optimization includes using neural architecture search...

That press article and the DONNA page keep it mostly high level / architecture though

josefwells t1_iy0ic6y wrote on November 27, 2022 at 8:44 PM

As a Qualcomm employee, I can confirm this is what our conference rooms look like.

Lonely_Tuner t1_iy24zue wrote on November 28, 2022 at 4:05 AM

I conceived this as AI Photogrammetry as a 3d Modeller. And Why should I watch the loading screen for my office?

I_LOVE_SOURCES t1_ixzo13o wrote on November 27, 2022 at 5:24 PM

I wonder why they weren’t walking around the room

the320x200 t1_iy3r3tw wrote on November 28, 2022 at 3:08 PM

They walked all around the table... The reconstructed view showed the geometry from a fixed point but the depth and camera view showed they we're walking around.

OverLemonsRootbeer t1_ixzz5js wrote on November 27, 2022 at 6:38 PM

This is amazing, it can make gaming and AR environment work so much easier.

[deleted] t1_iy2v3vm wrote on November 28, 2022 at 9:19 AM

This is (one of the many reasons) why I won't buy a Meta Quest. Inside-out tracking relies on this sort of thing, constantly scanning your house and building a model of it. Outside-in tracking, which is far more accurate, doesn't use cameras at all but a swept timing laser and basic photodiodes.

the320x200 t1_iy3qtmw wrote on November 28, 2022 at 3:05 PM

That works if you only want to do VR, but if you want to do useful AR/MR you need to map the environment one way or another.

SpatialComputing OP t1_iy4wchg wrote on November 28, 2022 at 7:47 PM

And Quest Pro doesn't even do 3D reconstruction.

cc u/beatthestupidout

_damak0s_ t1_iy01m15 wrote on November 27, 2022 at 6:54 PM

we tried this a decade ago and it did not work.

pennomi t1_iy11bxn wrote on November 27, 2022 at 10:54 PM

That’s a terrible argument for compute-heavy technology. Our devices are far better at this today.

_damak0s_ t1_iy11gaf wrote on November 27, 2022 at 10:55 PM

not what i meant. we don't have enough awareness to be able to check our 3d surroundings while reading a phone-like hud

pennomi t1_iy11r4s wrote on November 27, 2022 at 10:57 PM

Well you wouldn’t need realtime depth estimation for a HUD would you?

This would be more of a 6-DOF AR system, which can and does have real world applications.

[R] QUALCOMM demos 3D reconstruction on AR glasses — monocular depth estimation with self supervised neural network processed on glasses and smartphone in realtime

Comments