Submitted by Singularian2501 t3_11zsdwv in MachineLearning
endless_sea_of_stars t1_jde88qi wrote
Wonder how this compares to the Toolformer implementation.
https://arxiv.org/abs/2302.04761
Their technique was to use few shot (in context) learning to annotate a dataset with API calls. They took the annotated dataset and used it to fine tune the model. During inference the code would detect the API call, make the call, and then append the results to the text and keep going.
The limitation with that methodology is that you have to fine tune the model for each new API. Wonder what OpenAIs approach is?
Edit:
I read through the documentation. Looks like it is done through in context learning. As in they just prepend the APIs description to your call and let the model figure it out. That also means you get charged for the tokens used in the API description. Those tokens also count against the context window. Unclear if there was any fine tuning done on the model to better support APIs or if they are just using the base models capabilities.
iamspro t1_jderz7f wrote
I tried fine tuning vs few shot for my own implementation and in the end few shot was just much easier, despite the context window drawback. Huge advantage is you can dynamically add/remove/update APIs in an instant.
endless_sea_of_stars t1_jdezatt wrote
I suspect future versions will do both. They will "bake in" some basic APIs like simple calculator, calendar, fact look ups. They will use in context for 3rd party APIs.
iamspro t1_jdf0f1o wrote
Good point, that baking in could also include the overall sense of how to get the syntax right
countalabs t1_jdibk1j wrote
The "fine tuning" in OpenAI API can be few-shots. The other approach of putting the instruction or example in context should be called zero-shots.
iamspro t1_jdj4wzl wrote
Fine-tuning is distinct afaik... using OpenAI's language for it[1]:
zero-shot: no examples in the prompt, just an input (and/or instruction)
few-shot: one or more examples of input+output in the prompt, plus new input
fine-tuning: updating the model with examples (which can then be used with zero- or few-shot as you wish)
[1] https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api (part 5)
_faizan_ t1_jdwdwsm wrote
Is there an open Implementation of ToolFormer? or you rolled your own implementation for finetuning? They did mention in their paper that they gave few In-context examples of tool usage and then used GPT-J to label more text which they finally used for fine-tuning. Did you follow a similar approach. I have been looking to reproduce tool-former but not sure where to start even.
wind_dude t1_jdf5yhj wrote
Look at their limited docs, I feel it's a little simpler than toolformer, probably more like the blenderbot models for search, and prompt engineering.
- Matching intent from the prompt to a description of the plugin service
- extracting relevant terms from the prompts to send as query params based on description of the endpoint
- model incorporates API response into model response
​
"The file includes metadata about your plugin (name, logo, etc.), details about authentication required (type of auth, OAuth URLs, etc.), and an OpenAPI spec for the endpoints you want to expose.The model will see the OpenAPI description fields, which can be used to provide a natural language description for the different fields.We suggest exposing only 1-2 endpoints in the beginning with a minimum number of parameters to minimize the length of the text. The plugin description, API requests, and API responses are all inserted into the conversation with ChatGPT. This counts against the context limit of the model." - https://platform.openai.com/docs/plugins/introduction
signed7 t1_jdfcly9 wrote
It's a shame that 'Open'AI has become so closed. Would be so cool to see a proper paper with technical details on how this works...
meister2983 t1_jdgghu6 wrote
The Microsoft Research paper assessing intelligence capability of GPT4 effectively did this. If you just define APIs for the model to use under certain conditions it will write the API call. Once you do that, it's straightforward for a layer on top to detect the API call, actually execute it, and write the result back.
daugaard47 t1_jdkkyds wrote
Wish they would have stayed open source, but can understand why they would sell out. There would have been no way they could handle the amount of traffic/need if they would have remained a non-profit. But as someone who works for a non-profit, I don't understand how they legally changed to a for-profit over a weeks time period. 😐
godaspeg t1_jdgih6t wrote
In the "sparks of AGI" GPT4 Paper (can totally recommend to have a look, its crazy), the authors talk about the amazing abilities of the uncensored GPT4 version to use tools. Probably this suits quite well to the simple plugin approach of OpenAi, so I have high espectations.
Soc13In t1_jdh0n2m wrote
Link/citation please
godaspeg t1_jdh18s9 wrote
https://arxiv.org/abs/2303.12712
If you dont want to read 154 pages, here is an awsome summary:
Soc13In t1_jdhtpvf wrote
thank you.
drcopus t1_jdhjddx wrote
Imo doing everything in-context seems more hacky - I would rather see a Toolformer approach but I understand that it probably requires more engineering and compute.
I reckon the in-context approach probably makes the plugins less stable as the model has to nail the syntax. ChatGPT is good at coding but it makes basic errors often enough to notice.
Viewing a single comment thread. View all comments