Submitted by Singularian2501 t3_zyeeks in MachineLearning

Paper: https://arxiv.org/abs/2212.13894

Abstract:

>Remarkable progress has been made on automated reasoning with knowledge specified as unstructured, natural text, by using the power of large language models (LMs) coupled with methods such as Chain-of-Thought prompting and Selection-Inference. These techniques search for proofs in the forward direction from axioms to the conclusion, which suffers from a combinatorial explosion of the search space, and thus high failure rates for problems requiring longer chains of reasoning. The classical automated reasoning literature has shown that reasoning in the backward direction (i.e. from the intended conclusion to the set of axioms that support it) is significantly more efficient at proof-finding problems. We import this intuition into the LM setting and develop a Backward Chaining algorithm, which we call LAMBADA, that decomposes reasoning into four sub-modules, each of which can be simply implemented by few-shot prompted LM inference. We show that LAMBADA achieves massive accuracy boosts over state-of-the-art forward reasoning methods on two challenging logical reasoning datasets, particularly when deep and accurate proof chains are required.

https://preview.redd.it/q3ul0czx4w8a1.jpg?width=542&format=pjpg&auto=webp&s=30618e8ee9c766ee33ca1721b71e23c24f5de778

https://preview.redd.it/bqb28jzx4w8a1.jpg?width=539&format=pjpg&auto=webp&s=6ff28846b5659e3ab275e89018b5985ec0afcab4

https://preview.redd.it/nfx5jmzx4w8a1.jpg?width=435&format=pjpg&auto=webp&s=a2ac6e353b244ae3a3731212347d3527d9bc7a79

https://preview.redd.it/yd0zrfzx4w8a1.jpg?width=964&format=pjpg&auto=webp&s=81d67476f4492caa81f488c546dc9d6f50315915

https://preview.redd.it/34x4nlzx4w8a1.jpg?width=481&format=pjpg&auto=webp&s=4765f471e03976e62414659a68fd4f0525b40e4c

https://preview.redd.it/6tdhlkzx4w8a1.jpg?width=544&format=pjpg&auto=webp&s=341ec127c35a51bf7c5f5929df884e8e94acb321

58

Comments

You must log in or register to comment.

artoftheproblem t1_j272x5z wrote

so hard to keep up with progress...I'm still getting over the simple insight that asking it to "think step by step" was a huge boost in accuracy in the initial instrutgpt model

18

nogop1 t1_j27xexh wrote

I wonder whether the large models are not better due to their larger amount of params, but the increased number of layers. Thus being able to perform more steps and search more deeply.

If been wondering if certain questions/algos do not need a variable amount of steps. Leaving aside the universal function approximation theorem, would simple exponentiation not require that? If I were to ask a llm/transformer to perform these arithmetic operations?

1

farmingvillein t1_j2awxls wrote

Yes, and the old one was named relatively sanely:

> LAnguage Modeling Broadened to Account for Discourse Aspects

Whereas the new Google paper is a horror show in naming:

> We develop a hybrid LAnguage Model augmented BAckwarD chAining technique, dubbed LAMBADA

7

currentscurrents t1_j2by81g wrote

So, if I'm understanding right:

  • Backwards chaining is an old classical algorithm for logic proving.

  • They've implemented backwards chaining using a bunch of language models, so it works well with natural text.

  • Given a knowledge base (which are available as datasets these days), it can decompose a statement and check if it's logically consistent with that knowledge.

  • The reason they're interested in this is to use it as a training function to make language models more accurate.

This is effectively an old "expert system" from the 70s built out of neural networks. I wonder what other classical algorithms you can implement with neural networks.

I also wonder if you could use this to create its own knowledge base from internet data. Since the internet is full of contradicting information, you would have to compare new data against existing data somehow and decide which to keep.

8

currentscurrents t1_j2csenb wrote

The number of layers is a hyperparameter, and people do optimization to determine the optimal values for hyperparameters.

Model size does seem to be a real scaling law. It's possible that we will come up with better algorithms that work on smaller models, but it's also possible that neural networks need to be big to be useful. With billions of neurons and an even larger number of connections/parameters, the human brain is certainly a very large network.

3

xt-89 t1_j2dxg0p wrote

I’ve been thinking that we’re really leaving the domain of ‘Machine learning’ and entering the domain of ‘artificial cognition’. It seems like more of these expert system algorithms will be used going forward

3