StablePunFusion
StablePunFusion t1_jd8qm6b wrote
Reply to [P] CodeAlpaca Code and Data release by immune_star
Thanks for releasing the training data (https://github.com/sahil280114/codealpaca/blob/master/data/code_alpaca_20k.json).
Where was the training data gathered from? Has the data been verified to be correct?
I'm a tad sad to see that most of the training data doesn't have the language tagged anywhere, some do but most don't, so the resulting model might not be super useful as it'll confuse languages, I guess.
StablePunFusion t1_jd8r3xb wrote
Reply to comment by immune_star in [P] CodeAlpaca Code and Data release by immune_star
Do you (or anyone) know of any higher quality sources of training sets for code?
Seems to be lacking, at least when I searched around last time. Maybe it's time to spin up a community initiative around it?