Viewing a single comment thread. View all comments

kittenkrazy t1_jcjxr0b wrote

I wonder if this can effectively be used in LLaMA. 32K context would be a game changer

26

Nhabls t1_jck9a4c wrote

Yeah just need enough training time and data to be able to train those 32k context layers effectively........................

11

mrpogiface t1_jckmi7d wrote

Definitely, but you'd need to further fine-tune the model to "teach" it to make use of the additional context

9

kreuzguy t1_jcliuwx wrote

Someone should definitely look into this!

3