mikljohansson t1_jckedf9 wrote
Very interesting work! I've been following this project for a while now
Can I ask a few questions?
-
What's the difference between RWKV-LM and ChatRWKV, e.g. is ChatRWKV mainly RWKV-LM but streamlined for inference and ease of use, or is there more differences?
-
Are you planning to fine tune on the Stanford Alpaca dataset (like was recently done for LLaMa and GPT-J to create instruct versions of them), or a similar GPT-generated instruction dataset? I'd love to see a instruct-tuned version of RWKV-LM 14B with a 8k+ context len!
bo_peng OP t1_jcmajpx wrote
- RWKV-LM is now mainly for training, while ChatRWKV is for optimal inference.
- Someone in RWKV Discord tried it using LoRA (https://github.com/Blealtan/RWKV-LM-LoRA) and the result is quite nice. Join RWKV Discord for latest updates :)
Viewing a single comment thread. View all comments