ustainbolt
ustainbolt t1_je7plqi wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.
ustainbolt t1_ixrqhoy wrote
Reply to [D] Simple Questions Thread by AutoModerator
Could anyone help me by suggesting an architecture that might work for my problem? I've tried many (nn and otherwise) but haven't made much progress.
My data consist of groups of 10 people, each person has attached to them a number x_1 (identifying which person they are approx. ~1000 possibilities) and an integer x_2 (which can take one of ~150 different values). The group of 10 people then attempt to complete a task and if they are successful the data is labelled with 1, else it is labelled with 0.
You could think of the task as playing a football match (5v5) and x_2 as the position that they choose to play on the field.
Does this remind anyone of a particular class of problem?
ustainbolt t1_je7xtcw wrote
Reply to comment by wrossmorrow in [D] Training a 65b LLaMA model by Business-Lead2679
I love lambda. More reliable than vast.ai, and WAY cheaper than AWS/GCP/Azure.