Comments

You must log in or register to comment.

MassiveWasabi t1_je8atls wrote

This is really big, it’s basically a multimodal AI assistant that can be used for image, text, audio, etc. I’m really underselling it so at least skim the paper.

In terms of gaming, it can even control AI teammates individually so you can give different orders to each of your teammates to carry out complex strategies, which they say will let you feel like a team leader and increase the fun factor.

Most importantly:

All these cases have been implemented in practice and will be supported by the online system of TaskMatrix.AI, which will be released soon.

Sounds like this is something we will be able to play with sometime soon. Microsoft definitely wants to get these products into the hands of customers.

TL;DR: use ChatGPT

49

jason_bman t1_je9unzr wrote

Do you know if the example Figures are hand-typed by the researchers? For example, there is a prompt in Figure 9:

Human: I hope to eat an apple and drink a cup of milk.
Can you please pick them up from the fridge and put
them on the kitchen table?

TaskMatrix.AI: Sure, I can help you with that.
robot_go_to("fridge")
robot_pick_up("egg")
robot_go_to("kitchen table")
robot_put_down()
robot_go_to("fridge")
robot_pick_up("milk")
robot_go_to("kitchen table")
robot_put_down()

Wondering if "egg" is just a typo from the research team. Seems like an error that a large LLM would not make.

8

MassiveWasabi t1_jeb67h2 wrote

I looked into that just now and my conclusion is that there may be some translation issue between the researchers and the AI. The researchers are all Chinese and I can see some other simple English mistakes, so I'm not sure if they were using something for translation or if they were just typing in English. Maybe they did all of the research in Chinese and then translated for us to read the paper. I don't really know, though.

3

jason_bman t1_jecvrql wrote

I actually had the same thought given the grammar mistakes. Still an awesome paper!

2

FlyingCockAndBalls t1_je8gqju wrote

it feels like the pace is moving lightning fast, and yet also super slow.

25

acutelychronicpanic t1_je9eqky wrote

The world is mostly sleepwalking through this.

The news tonight if this gets covered: "ChatGPT can do more than just essays? New developments in a field called aye eye might put chemistry homework at risk. More at 9."

18

KRCopy t1_jeaxro1 wrote

Who else saw and heard the local news guy from Arrested Development (and real life Orange County)?

1

YaAbsolyutnoNikto t1_je8w8j3 wrote

Why is that?

1

FlyingCockAndBalls t1_je8wts0 wrote

I guess it's just because there still hasn't been societal upheaval. But rome wasn't built in a day. I guess its like watching the early internet, unable to predict how hard the future is gonna change while the general population just brushes it off till it infiltrates everything

13

DragonForg t1_je8pbf6 wrote

New AI news. Now imagine, pairing up the task API with this: https://twitter.com/yoheinakajima/status/1640934493489070080?s=46&t=18rqaK_4IAoa08HpmoakCg

It will be OP. Imagine, GPT please solve world hunger, and the robot model it suggest could actually do physical work. We just need robotics to get hooked up to this so we can get autonomous task robots.

Imagine, we can start small but we can say, Robot build a wooden box. And with this API along with this: https://twitter.com/yoheinakajima/status/1640934493489070080?s=46&t=18rqaK_4IAoa08HpmoakCg you can get seemingly a robot doing the task autonomously.

15

GoldenRain t1_je9w4um wrote

>It will be OP. Imagine, GPT please solve world hunger, and the robot model it suggest could actually do physical work.

That's where the alignment problem comes in. An easy solution to solve world hunger is to reduce the population in one way or another but that it is not aligned with what we actually want.

3

Hbirkeland t1_je8mtxn wrote

Interesting! Is this basically what ChatGPT is doing with plugins, but with a much broader scope (connecting any foundation model)?

9

Sad_Laugh_8337 t1_je9kvox wrote

>Visual ChatGPT is just an example of applying TaskMatrix.AI to the visual domain.

So I guess that answers my initial question -- AI models for tools.

Now apply this logic to:

https://twitter.com/yoheinakajima/status/1640934493489070080?s=46&t=18rqaK_4IAoa08HpmoakCg

I believe this could get us to strong Proto-AGI (just made that up). Why?

  • AI models as agents for the specific cases mentioned in Yohe's twitter post -- task keeping, planning, etc...

Very soon we will have an AI model that could perfect every task if using tools that are fine tuned. I believe this puts it into the category of strong Proto-AGI.

6

acutelychronicpanic t1_je9efjl wrote

Imagine when the AI can create its own tools. Use an LLM with all the tools already mentioned as a base. If the AI detects that it has low confidence or bad results in a particular domain, it try and create a program or set up a narrow ML model to handle it.

5

Sad_Laugh_8337 t1_je9j1hm wrote

So is this basically saying we can use AI models as tools similar to how they're already using plugins?

So we could see fine tuned models for specific tasks being used as tools? Is this what this is going for?

Would this not end up being a very strong proto AGI like invention at this point? If there were fine tuned models for things like web scraping or fine tuned models for booking a flight, it would be super charged to do all of those tasks better than any other human?

Am I looking at this wrong?

5