Submitted by bo_peng t3_11teywc in MachineLearning
I try the "Alpaca prompt" on RWKV 14B ctx8192, and to my surprise it works out of box without any finetuning (RWKV is a 100% RNN trained on 100% Pile v1 and nothing else):
You are welcome to try it in RWKV 14B Gradio (click examples below the panel):
https://huggingface.co/spaces/BlinkDL/ChatRWKV-gradio
Tips: try "Expert Response" or "Expert Long Response" or "Expert Full Response" too.
===================
ChatRWKV v2 is now using a CUDA kernel to optimize INT8 inference (23 token/s on 3090): https://github.com/BlinkDL/ChatRWKV
Upgrade to latest code and "pip install rwkv --upgrade" to 0.5.0, and set os.environ["RWKV_CUDA_ON"] = '1' in v2/chat.py to enjoy the speed.
The inference speed (and VRAM consumption) of RWKV is independent of ctxlen, because it's an RNN (note: currently the preprocessing of a long prompt takes more VRAM but that can be optimized because we can process in chunks).
Meanwhile I find the latest RWKV-4-Pile-14B-20230313-ctx8192-test1050 model can utilize a long ctx:
yehiaserag t1_jcj305q wrote
How does that version compare to "RWKV-4-Pile-14B-20230228-ctx4096-test663"?