Viewing a single comment thread. View all comments

mikljohansson t1_jckedf9 wrote

Very interesting work! I've been following this project for a while now

Can I ask a few questions?

  • What's the difference between RWKV-LM and ChatRWKV, e.g. is ChatRWKV mainly RWKV-LM but streamlined for inference and ease of use, or is there more differences?

  • Are you planning to fine tune on the Stanford Alpaca dataset (like was recently done for LLaMa and GPT-J to create instruct versions of them), or a similar GPT-generated instruction dataset? I'd love to see a instruct-tuned version of RWKV-LM 14B with a 8k+ context len!

3

bo_peng OP t1_jcmajpx wrote

  • RWKV-LM is now mainly for training, while ChatRWKV is for optimal inference.
  • Someone in RWKV Discord tried it using LoRA (https://github.com/Blealtan/RWKV-LM-LoRA) and the result is quite nice. Join RWKV Discord for latest updates :)
3