Had this idea and was planning to play around with it when I had more free time. Good to see some evidence it’s a promising direction. I speculate you can actually get a LOT out of this if you’re clever with it. A tool for long term memory could be done by having a lookup table with text embeddings as keys. A tool for vision could be made with an image captioning model + maybe some segmentation to get a richer text description of the image. Many more things you could come up with, that I think could work well if you find some clever way of turning them into text.
swegmesterflex t1_j8d3t4r wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Had this idea and was planning to play around with it when I had more free time. Good to see some evidence it’s a promising direction. I speculate you can actually get a LOT out of this if you’re clever with it. A tool for long term memory could be done by having a lookup table with text embeddings as keys. A tool for vision could be made with an image captioning model + maybe some segmentation to get a richer text description of the image. Many more things you could come up with, that I think could work well if you find some clever way of turning them into text.