Viewing a single comment thread. View all comments

Lajamerr_Mittesdine OP t1_itomfs6 wrote

CoT simply breaks down a problem into multiple interconnected solution statements to arrive at one conclusive answer.

You can prompt a CoT Model to go down different reasoning structures and arrive at different answers(but sometimes wrong) but those are all independent from one another.

Note that this is fine-tuning an existing LLM.

This fine-tuning is in part done by a hypermodel that helps rank solutions. These solutions are then used to fine-tune the model even further to become better reasoners using its own generated answers.

So the model uses its own understandings to generate CoT solution statements. The hypermodel would rank those statements and then the existing model can be fine-tuned on the newly generated positive and negative solutions reinforcing the idea of what correct solution statements look like and what negative ones look like as well.

Future work: So what is limiting the LLM model from eventually getting to 100%~ ? The bottleneck from preventing this going exponential is the hypermodel that can accurately rank the solution. Theoretically if you had a perfect ranker blackbox you could eventually get to 100%~. So what you would want in future work is either just a more accurate ranker overall or someway to continuously improve the ranker hypermodel in an unsupervised fashion just like we have this hypermodel for the LLM.

Personal Opinion: So what this really is doing is just solving some low hanging fruit in prompting the LLM in reasonings it already understands in different contexts and more finely puts them as the highest ranking solutions across a broader range. It's not learning new concepts entirely.

10