Viewing a single comment thread. View all comments

boxen t1_ja1ze12 wrote

Isn't computer vision pretty complicated too? To my knowledge most of what we have there is face detection and moving object detection (person bike car truck) for self driving cars. I feel like the understanding-of-what-every-object-is required for a humanoid robot to help for home chores is still also kinda far away.

0

CypherLH t1_ja1zrzu wrote

this is mostly solved already actually. All of the large image generation tools are also image _recognition_ tools, and some of them can explicitly do image-to-text as well where they can highly accurately describe an image fed to it. We just haven't seen this capability impact any consumer markets yet outside of image generation, presumably because the inference for these AI models needs a lot of compute.

3

Dreikesehoch t1_ja31a15 wrote

It’s not solved, modern CV is completely different and inferior from how animals perceive visually. Animals don’t do geometry->labelling, they do geometry->action.

2

CypherLH t1_ja4wx0l wrote

I said _mostly_ solved. Labelling/geometry/categorization are huge prerequisite steps to get to "actions". I assume video generation/description will be the final step needed as it gives the model an "understanding" of relations between objects over time. In other words true scene recognition. In fact I assume multi-modal models that combine language/imagery AND video will end up being another leap forward since such neural nets would have a much more robust world model.

1

Dreikesehoch t1_ja5fjub wrote

I know, I read that. But I said that what we have now isn’t just “not quite there, yet”. It’s a totally different thing from what it should be. Animals don’t do scene or object recognition (i.e. labelling). Animals simulate actions on the visual stimuli to infer what actions they can apply to their surroundings physically or virtually and then after that there might be some symbolic labelling. Like when you look at a door: you don’t primarily see a door but you infer a list of actions that are applicable to the geometric manifold that a door represents. You might act on the door by opening it or closing it without even thinking consciously about the door. When you focus on it you can classify it as a door through the set of applicable actions. I am sure you can relate. There is some very interesting educational content about this on youtube.

2

CypherLH t1_ja5mgs5 wrote

Well presumably humans and animals ARE first labelling/categorizing but it happens at a very low level...our higher brain functions then act on that raw data. You still need that lower level base image recognition functionality to be in place though. Presumably AI could do something similar, have a higher-level model that takes input from a lower level base image recognition model.

​

From an AI/software perspective that base image recognition functionality will be extremely useful once inference costs come down.

2

turnip_burrito t1_ja6wfzt wrote

In a human brains I'd guess it's a mix of both things. A more reflexive response not requiring labeling, and a response to many different kinds of post-labeled signals relating to the door. Not sure how much of each though.

2

Dreikesehoch t1_ja7cn8v wrote

This is what I used to believe, too. But psychologists have shown that it is not the case. Think of small children: they act on their environment without recognizing objects and thereby learn what things are. Like opening/closing drawers, tearing paper, putting things in their mouth to find out if it is edible, etc.. And you surely noticed that if you see or think of something that you don’t know the function of, you can’t visualize it.

1

CypherLH t1_ja8ey2a wrote

Maybe. Its also possible that AI's more explicit _recognition_ capability will end up being super-human since its not limited by evolutionary kludges, at least once we have proper multi-modal visual models.

To use the old cliche example; our aircraft aren't as efficient as birds...but no Bird can carry hundreds of passengers or achieve supersonic speeds, etc.

1

Dreikesehoch t1_jaajuo8 wrote

We already know that brains are intelligent. We have no idea whether object recognition is a more efficient way. We don’t even know if it will lead to anything intelligent. Better to just build a scaled up version of the human brain and then let this AI figure out the next steps.

1

CypherLH t1_jabwb5z wrote

But we don't know how to make human brains aside from producing people of course ;) We do know how to create AI models though. Considering the rate of progress in just the past year I wouldn't want to bet against image generation and recognition technology.

1

Dreikesehoch t1_jaeijpt wrote

True, but we make progress figuring out how the brain works and eventually we will have a working virtual model of a brain. Image generation and recognition are improving very fast, but the lower bound on energy consumptions appears to be too high in comparison with the energy consumption of the brain. There are neuromorphic chip companies that develop different architectures that are more similar to brains than conventional architectures. They have much lower power consumption. I would prefer if we could get there using current fabs and architecture, but I am very skeptical so far.

1

CypherLH t1_jaesy8l wrote

​

I get what you are saying but not sure what the basis for skepticism right now is. Things are developing INSANELY fast since early last year; its hard to imagine things developing any faster and more impressively than they did and still are. I guess you can assume that we're close to some upper limit but I don't see a basis for assuming that.

1