Viewing a single comment thread. View all comments

MassiveWasabi t1_je8atls wrote

This is really big, it’s basically a multimodal AI assistant that can be used for image, text, audio, etc. I’m really underselling it so at least skim the paper.

In terms of gaming, it can even control AI teammates individually so you can give different orders to each of your teammates to carry out complex strategies, which they say will let you feel like a team leader and increase the fun factor.

Most importantly:

All these cases have been implemented in practice and will be supported by the online system of TaskMatrix.AI, which will be released soon.

Sounds like this is something we will be able to play with sometime soon. Microsoft definitely wants to get these products into the hands of customers.

TL;DR: use ChatGPT

49

jason_bman t1_je9unzr wrote

Do you know if the example Figures are hand-typed by the researchers? For example, there is a prompt in Figure 9:

Human: I hope to eat an apple and drink a cup of milk.
Can you please pick them up from the fridge and put
them on the kitchen table?

TaskMatrix.AI: Sure, I can help you with that.
robot_go_to("fridge")
robot_pick_up("egg")
robot_go_to("kitchen table")
robot_put_down()
robot_go_to("fridge")
robot_pick_up("milk")
robot_go_to("kitchen table")
robot_put_down()

Wondering if "egg" is just a typo from the research team. Seems like an error that a large LLM would not make.

8

MassiveWasabi t1_jeb67h2 wrote

I looked into that just now and my conclusion is that there may be some translation issue between the researchers and the AI. The researchers are all Chinese and I can see some other simple English mistakes, so I'm not sure if they were using something for translation or if they were just typing in English. Maybe they did all of the research in Chinese and then translated for us to read the paper. I don't really know, though.

3

jason_bman t1_jecvrql wrote

I actually had the same thought given the grammar mistakes. Still an awesome paper!

2