Submitted by floppy_llama t3_1266d02 in MachineLearning
saintshing t1_je9fciu wrote
Reply to comment by EquipmentStandard892 in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
> I was curious if as our brains do, use another instance of the same LLM to generate little hypothesis about the ongoing conversation, and store those on a vector space database, then use those generated thesis during reasoning.
I just learned about LangChain recently. If I understand correctly, they have agents that integrate LLMs and external tools like internet search, sql query, vector store query, it also has a memory module to store ongoing dialog and intermediate results.
They use ReAct or MKRL framework to create subprolems, decide what tools to use and how to react to the results returned by those tools.
example: https://tsmatz.files.wordpress.com/2023/03/20230307_paper_example.jpg?w=446&zoom=2
https://python.langchain.com/en/latest/getting_started/getting_started.html
https://tsmatz.wordpress.com/2023/03/07/react-with-openai-gpt-and-langchain/
https://twitter.com/yoheinakajima/status/1640934493489070080
EquipmentStandard892 t1_je9gc9y wrote
I've already seen langchain and it's truly amazing, the issue I've encountered and was trying to overcome is more an architectural problem actually, the token context span limit. I was looking to add a layer upon the transformer architecture to bypass this limitations, I've seen MKRL is able to handle higher context lengths, even claiming unlimited context span, although need to study more. I was not thinking about prompt engineering at all.
saintshing t1_je9iw85 wrote
Jeremy Howard tweeted about this new model that is RNN but can be trained in parallel. I havent read the details but it seems people are hyped that it can bypass the context length limit.
>RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
>So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
https://github.com/BlinkDL/RWKV-LM#the-rwkv-language-model-and-my-tricks-for-lms
https://twitter.com/BlinkDL_AI/status/1638555109373378560
EquipmentStandard892 t1_je9kmvi wrote
This exactly what I was talking about, I'm studying the llama.cpp to understand how this whole ML LLM world works, and I've found its pretty "simple" in the meanings of the programming itself. I'm a software engineer outside the ML field, and it was pretty interesting to do this deep dive. I'll take a deeper look into this RWKV proposal and maybe make something upon to test. If I found something interesting I comment here 😊
JustOneAvailableName t1_jea2dzf wrote
Software engineer perspective on attention (self quote):
> You have to think about searching. If you search, you have a query (the search term), some way to correlate the query to the actual (size unknown/indifferent) knowledge base and the knowledge base itself. If you have to write this as a mathematical function you have to have something that matches a query, to how similar it is to some key and then return the corresponding value to that key. The transformer equation is a pretty straightforward formula from that perspective. Each layers learns what it searches for, how it can be found and which value it wants to transfer when requested.
RWKV changes this by removing the query. So data is not requested anymore, only pushed. I am frankly surprised to seems to work thus far. Pushing data (self determining how important something is for something else) is not dependant on other states, enabling it to be a RNN.
Edit: step I need to mention: in RWKV importance also fades over time, so it has a recency bias
EquipmentStandard892 t1_jeaqt6u wrote
I've already had that in mind, I've found some interesting paper talking about integrating LLMs in a specific way designed to handle autonomous task execution given an direct objective/goal. Combining this with this RNN approach seems to be the go to for increase the cognitive development of the whole system. Using the RNN as our subconscious would do and indexing this into a vector space capable of hybrid search, or something like SPLADE search engines, or even build a neural attention graph network to store the rules that aggregate the raw tokens into the vector space, could drastically improve the performance of small language models, maybe leading to further optimization beyond the token limit span.
Article about integrating memory and task/objectives using multiple LLM instances: https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/
A_Light_Spark t1_jeaim48 wrote
The real vip is in the comments again. TIL about rwkv!
Now I just need to read up on it and see if it can do sequence classification...
saintshing t1_jeaowjz wrote
I almost missed it too. There are too many new results.
The most crazy thing is it is all done by one person when the big techs all work on transformer models.
unkz t1_je9wuzm wrote
Practically speaking, it does have a context limit — that RNN issue has not really been solved. It is a lot of fun to play with though.
MasterEpictetus t1_jecc69k wrote
Why am I only hearing about this now? It sounds amazing!
Viewing a single comment thread. View all comments