Submitted by AuspiciousApple t3_ydwi6c in MachineLearning
Playing around with GPT3 was quite fun, but now my credits are expired. I used it primarily just for fun to see what the model knows and understands, and occasionally for brainstorming and writing.
I've tried T5-FLAN XL which runs very fast on my 3060ti and in theory should be quite good, but the output was quite lacklustre compared to GPT3. Even for examples from the paper (e.g. the scarcasm detection) the answer was typically correct but the chain of thought leading up to it was only convincing 10% of the time.
I experimented with a few different sampling strategies (temp, top_p, top_k, beam search etc.) but still quite disappointing results.
Any advice? Would the XXL model or one of meta's LLMs be much better? Are there magic settings for sampling? Are there any existing notebooks/GUI tools that I can conveniently run locally?
Cheers!
Southern-Trip-1102 t1_itwt1ac wrote
You might want to look into the BLOOM models on huggingface.