Overcoming classic challenges for artificial neural networks by providing incentives and practice

Overcoming classic challenges for artificial neural networks by providing incentives and practice

  • McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).

    Article 
    MathSciNet 

    Google Scholar 

  • Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).

    Article 

    Google Scholar 

  • Rumelhart, D. E., McClelland, J. L. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (MIT Press, 1986).

  • McClelland, J. L., Rumelhart, D. E. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models (MIT Press, 1986).

  • Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).

    Article 

    Google Scholar 

  • Marcus, G. F. Rethinking eliminative connectionism. Cogn. Psychol. 37, 243–282 (1998).

    Article 

    Google Scholar 

  • Lake, B. M. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2873–2882 (PMLR, 2018).

  • Greff, K., van Steenkiste, S. & Schmidhuber, J. On the binding problem in artificial neural networks. Preprint at (2020).

  • McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989).

    Article 

    Google Scholar 

  • Ratcliff, R. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 285–308 (1990).

    Article 

    Google Scholar 

  • French, R. M. Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proc. 13th Annual Conference of the Cognitive Science Society 173–178 (Cognitive Science Society, 1991).

  • French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 128–135 (1999).

    Article 

    Google Scholar 

  • Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992).

    Article 

    Google Scholar 

  • Miller, E. G., Matsakis, N. E. & Viola, P. A. Learning from one example through shared densities on transforms. In Proc. Conference on Computer Vision and Pattern Recognition 464–471 (IEEE, 2000).

  • Fei-Fei, L., Fergus, R. & Perona, P. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. 9th IEEE International Conference on Computer Vision 1134–1141 (IEEE, 2003).

  • Lake, B., Salakhutdinov, R., Gross, J. & Tenenbaum, J. One shot learning of simple visual concepts. In Proc. 33rd Annual Conference of the Cognitive Science Society (eds Carlson, L. et al.) 2568–2573 (Cognitive Science Society, 2011).

  • Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).

    Article 
    MathSciNet 

    Google Scholar 

  • Anderson, C. W. Learning and Problem-solving with Multilayer Connectionist Systems (Adaptive, Strategy Learning, Neural Networks, Reinforcement Learning). PhD thesis, Univ. Massachusetts Amherst (1986).

  • Schmidhuber, J. Towards Compositional Learning in Dynamic Networks. Technical Report FKI-129-90 (Institut für Informatik, Technische Universität München, 1990).

  • Chollet, F. On the measure of intelligence. Preprint at (2019).

  • LeCun, Y. A path towards autonomous machine intelligence. OpenReview.net (2022).

  • Anderson, J. R. Problem solving and learning. Am. Psychol. 48, 35–44 (1993).

    Article 

    Google Scholar 

  • LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article 

    Google Scholar 

  • Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).

    Article 

    Google Scholar 

  • Griffiths, T. L. Understanding human intelligence through human limitations. Trends Cogn. Sci. 24, 873–883 (2020).

    Article 

    Google Scholar 

  • Griffiths, T. L. et al. Doing more with less: meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 29, 24–30 (2019).

    Article 

    Google Scholar 

  • Binz, M. et al. Meta-learned models of cognition. Behav. Brain Sci. 47, e147 (2023).

    Article 

    Google Scholar 

  • Ong, D. C., Zhi-Xuan, T., Tenenbaum, J. B. & Goodman, N. D. Probabilistic programming versus meta-learning as models of cognition. Behav. Brain Sci. 47, e158 (2024).

    Article 

    Google Scholar 

  • Marinescu, I., McCoy, R. T. & Griffiths, T. L. Distilling symbolic priors for concept learning into neural networks. In Proc. 46th Annual Conference of the Cognitive Science Society 5848–5855 (Cognitive Science Society, 2024).

  • Nussenbaum, K. & Hartley, C. A. Understanding the development of reward learning through the lens of meta-learning. Nat. Rev. Psychol. 3, 424–438 (2024).

    Article 

    Google Scholar 

  • Nussenbaum, K. & Hartley, C. A. Meta-learned models as tools to test theories of cognitive development. Behav. Brain Sci. 47, e157 (2024).

    Article 

    Google Scholar 

  • Russin, J., McGrath, S. W., Pavlick, E. & Frank, M. J. Is human compositionality meta-learned? Behav. Brain Sci. 47, e162 (2024).

  • Smolensky, P. The constituent structure of connectionist mental states: a reply to Fodor and Pylyshyn. South. J. Philos. 26, 137–161 (1988).

    Article 

    Google Scholar 

  • Fodor, J. A. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).

    Article 

    Google Scholar 

  • Chalmers, D. J. Why Fodor and Pylyshyn were wrong: the simplest refutation. In Proc. 12th Annual Conference of the Cognitive Science Society 340–347 (Cognitive Science Society, 1990).

  • Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).

    Article 

    Google Scholar 

  • Hadley, R. F. Systematicity revisited: Reply to Christiansen and Chater and Niklasson and van Gelder. Mind Lang. 9, 431–444 (1994).

    Article 

    Google Scholar 

  • Niklasson, L. F. & Van Gelder, T. On being systematically connectionist. Mind Lang. 9, 288–302 (1994).

    Article 

    Google Scholar 

  • Frank, S. L., Haselager, W. F. & van Rooij, I. Connectionist semantic systematicity. Cognition 110, 358–379 (2009).

    Article 

    Google Scholar 

  • Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to ChatGPT: compositionality in language, cognition, and deep neural networks. Preprint at (2024).

  • Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2001).

  • Alhama, R. G. & Zuidema, W. A review of computational models of basic rule learning: the neural-symbolic debate and beyond. Psychon. Bull. Rev. 26, 1174–1194 (2019).

    Article 

    Google Scholar 

  • Kurtz, K. J. Simple auto-associative networks succeed at universal generalization of the identity function and reduplication rule. Cogn. Sci. 49, e70033 (2025).

    Article 

    Google Scholar 

  • Liška, A., Kruszewski, G. & Baroni, M. Memorize or generalize? Searching for a compositional RNN in a haystack. In Proc. Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP, 2018).

  • Bahdanau, D. et al. CLOSURE: assessing systematic generalization of CLEVR models. In Proc. Visually Grounded Interaction and Language (ViGIL), NeurIPS 2019 Workshop (NeurIPS, 2019).

  • Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise?J. Artif. Intell. Res. 67, 757–795 (2020).

    Article 
    MathSciNet 

    Google Scholar 

  • Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020).

  • Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In Proc. 8th International Conference on Learning Representations (ICLR, 2020).

  • Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021).

  • Csordás, R., Irie, K. & Schmidhuber, J. The neural data router: adaptive control flow in transformers improves systematic generalization. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).

  • Csordás, R., Irie, K. & Schmidhuber, J. CTL++: evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 9758–9767 (Association for Computational Linguistics, 2022).

  • McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).

  • Chen, X., Liang, C., Yu, A. W., Song, D. & Zhou, D. Compositional generalization via neural-symbolic stack machines. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).

  • Nye, M. I., Solar-Lezama, A., Tenenbaum, J. B. & Lake, B. M. Learning compositional rules via neural program synthesis. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).

  • Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).

    Article 

    Google Scholar 

  • Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. Matching networks for one shot learning. In Proc. Advances in Neural Information Processing Systems 29 (eds Lee, D. et al.) (NeurIPS, 2016).

  • Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. In Proc. 5th International Conference on Learning Representations (ICLR, 2017).

  • Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  • Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. One-shot learning by inverting a compositional causal process. In Proc. Advances in Neural Information Processing Systems 26 (eds Burges, C. J. et al.) (NeurIPS, 2013).

  • Hsu, Y.-C., Liu, Y.-C., Ramasamy, A. & Kira, Z. Re-evaluating continual learning scenarios: a categorization and case for strong baselines. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).

  • van de Ven, G. M. & Tolias, A. S. Three scenarios for continual learning. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).

  • McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).

    Article 

    Google Scholar 

  • Robins, A. Catastrophic forgetting, rehearsal and pseudorehearsal. Conn. Sci. 7, 123–146 (1995).

    Article 

    Google Scholar 

  • Wang, L., Zhang, X., Su, H. & Zhu, J. A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024).

    Article 

    Google Scholar 

  • Fei-Fei, L., Fergus, R. & Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006).

    Article 

    Google Scholar 

  • Carey, S., Diamond, R. & Woods, B. Development of face recognition: a maturational component? Devel. Psychol. 16, 257 (1980).

    Article 

    Google Scholar 

  • Biederman, I. Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115 (1987).

    Article 

    Google Scholar 

  • Carey, S. in Linguistic Theory and Psychological Reality (eds Halle, M. et al.) 264–293 (MIT Press, 1978).

  • Carey, S. & Bartlett, E. Acquiring a single new word. Pap. Rep. Child Lang. Dev. 15, 17–29 (1978).

    Google Scholar 

  • Bloom, P. How Children Learn the Meanings of Words (MIT Press, 2000).

  • Frank, M. C. Bridging the data gap between children and large language models. Trends Cogn. Sci. 27, 990–992 (2023).

    Article 

    Google Scholar 

  • Johnson-Laird, P. N. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness (Harvard Univ. Press, 1983).

  • Fedorenko, E. & Varley, R. Language and thought are not the same thing: evidence from neuroimaging and neurological patients. Ann. N. Y. Acad. Sci. 1369, 132–153 (2016).

    Article 

    Google Scholar 

  • Smith, E. E., Langston, C. & Nisbett, R. E. The case for rules in reasoning. Cogn. Sci. 16, 1–40 (1992).

    Article 

    Google Scholar 

  • Sun, R. Robust reasoning: integrating rule-based and similarity-based reasoning. Artif. Intell. 75, 241–295 (1995).

    Article 

    Google Scholar 

  • Browne, A. & Sun, R. Connectionist inference models. Neural Netw. 14, 1331–1355 (2001).

    Article 

    Google Scholar 

  • Newell, A. Unified Theories of Cognition (Harvard Univ. Press, 1990).

  • Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).

    Article 
    MathSciNet 

    Google Scholar 

  • Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3987–3995 (PMLR, 2017).

  • Schmidhuber, J. Evolutionary Principles in Self-referential Learning. On Learning How to Learn: The Meta-Meta-… Hook. PhD thesis, Technische Univ. München (1987).

  • Cotter, N. E. & Conwell, P. R. Fixed-weight networks can learn. In Proc. 1990 IJCNN International Joint Conference on Neural Networks 553–559 (IEEE, 1990).

  • Cotter, N. E. & Conwell, P. R. Learning algorithms and fixed dynamics. In Proc. IJCNN-91-Seattle International Joint Conference on Neural Networks 799–801 (IEEE, 1991).

  • Younger, A. S., Conwell, P. R. & Cotter, N. E. Fixed-weight on-line learning. IEEE Trans. Neural Netw. 10, 272–283 (1999).

    Article 

    Google Scholar 

  • Hochreiter, S., Younger, A. S. & Conwell, P. R. Learning to learn using gradient descent. In Proc. Artificial Neural Networks – ICANN 2001 (eds Dorffner, G. et al.) 87–94 (Springer, 2001).

  • Rich, J. A. & Farrall, G. A. Vacuum arc recovery phenomena. Proc. IEEE 52, 1293–1301 (1964).

    Article 

    Google Scholar 

  • White, M. W., Holdaway, R. M., Guo, Y. & Paulos, J. J. New strategies for improving speech enhancement. Int. J. Biomed. Comput. 25, 101–124 (1990).

    Article 

    Google Scholar 

  • Bosc, T. Learning to learn neural networks. In Proc. Reasoning, Attention, Memory (RAM) Workshop, NIPS 2015 (NeurIPS, 2015).

  • Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. P. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).

  • Duan, Y. et al. RL2: fast reinforcement learning via slow reinforcement learning. Preprint at (2016).

  • Wang, J. et al. Learning to reinforcement learn. In Proc. 39th Annual Conference of the Cognitive Science Society (eds Gunzelmann, G. et al.) 1319 (Cognitive Science Society, 2017).

  • Munkhdalai, T. & Yu, H. Meta networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 2554–2563 (PMLR, 2017).

  • Mishra, N., Rohaninejad, M., Chen, X. & Abbeel, P. A simple neural attentive meta-learner. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).

  • Brown, T. B. et al. Language models are few-shot learners. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).

  • Xie, S. M., Raghunathan, A., Liang, P. & Ma, T. An explanation of in-context learning as implicit Bayesian inference. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).

  • Garg, S., Tsipras, D., Liang, P. & Valiant, G. What can transformers learn in-context? A case study of simple function classes. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Raventós, A., Paul, M., Chen, F. & Ganguli, S. Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).

  • Panwar, M., Ahuja, K. & Goyal, N. In-context learning through the Bayesian prism. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).

  • Von Oswald, J. et al. Transformers learn in-context by gradient descent. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 35151–35174 (PMLR, 2023).

  • Dai, D. et al. Why can GPT learn in-context? Language models secretly perform gradient descent as meta-optimizers. In Proc. Findings Association for Computational Linguistics:ACL 2023 (eds Rogers, A. et al.) 4005–4019 (Association for Computational Linguistics, 2023).

  • Akyürek, E., Schuurmans, D., Andreas, J., Ma, T. & Zhou, D. What learning algorithm is in-context learning? Investigations with linear models. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).

  • Min, S. et al. Rethinking the role of demonstrations: what makes in-context learning work? In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg Y. et al.) 11048–11064 (Association for Computational Linguistics, 2022).

  • Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).

    Article 

    Google Scholar 

  • Irie, K., Csordás, R. & Schmidhuber, J. Metalearning continual learning algorithms. Trans. Mach. Learn. Res. (2025).

  • Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).

  • Schmidhuber, J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4, 131–139 (1992).

    Article 

    Google Scholar 

  • Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5156–5165 (PMLR, 2020).

  • Schlag, I., Irie, K. & Schmidhuber, J. Linear transformers are secretly fast weight programmers. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 9355–9366 (PMLR, 2021).

  • Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. Going beyond linear transformers with recurrent fast weight programmers. In Proc. Advances in Neural Information Processing Systems 34 (eds Ranzato, M. et al.) (NeurIPS, 2021).

  • Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).

  • Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article 

    Google Scholar 

  • Graves, A. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016).

    Article 

    Google Scholar 

  • Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1126–1135 (PMLR, 2017).

  • Finn, C. & Levine, S. Meta-learning and universality: deep representations and gradient descent can approximate any learning algorithm. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).

  • Javed, K. & White, M. Meta-learning representations for continual learning. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) (NeurIPS, 2019).

  • Beaulieu, S. et al. Learning to continually learn. In Proc. 24th European Conference on Artificial Intelligence (eds De Giacomo, G. et al.) 992–1001 (IOS Press, 2020).

  • Conklin, H., Wang, B., Smith, K. & Titov, I. Meta-learning to compositionally generalize. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (eds Zong, C. et al.) 3322–3335 (Association for Computational Linguistics, 2021).

  • Lee, S., Son, J. & Kim, G. Recasting continual learning as sequence modeling. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).

  • Vettoruzzo, A., Vanschoren, J., Bouguelia, M.-R. & Rögnvaldsson, T. S. Learning to learn without forgetting using attention. In Proc. 3rd Conference on Lifelong Learning Agents (eds Lomonaco, V. et al.) 285–300 (PMLR, 2024).

  • Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. A modern self-referential weight matrix that learns to modify itself. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 9660–9677 (PMLR, 2022).

  • Irie, K., Csordás, R. & Schmidhuber, J. Practical computational power of linear transformers and their recurrent and self-referential extensions. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 9455–9465 (Association for Computational Linguistics, 2023).

  • Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F. & Schmidhuber, J. Compete to compute. In Proc. Advances in Neural Information Processing Systems 36 (eds Burges, C. J. et al.) (NeurIPS, 2013).

  • Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).

  • Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, pgae233 (2024).

  • Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. Preprint at (2021).

  • Cobbe, K. et al. Training verifiers to solve math word problems. Preprint at (2021).

  • Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022)

  • Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Rajani, N. F., McCann, B., Xiong, C. & Socher, R. Explain yourself! Leveraging language models for commonsense reasoning. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 4932–4942 (Association for Computational Linguistics, 2019).

  • Lightman, H. et al. Let’s verify step by step. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).

  • Kirchner, J. H. et al. Prover-verifier games improve legibility of LLM outputs. Preprint at (2024).

  • Zelikman, E., Wu, Y., Mu, J. & Goodman, N. D. STaR: bootstrapping reasoning with reasoning. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Gandhi, K. et al. Stream of search (SoS): learning to search in language. In Proc. 1st Conference on Language Modeling (COLM, 2024).

  • Shanahan, M. & Mitchell, M. Abstraction for deep reinforcement learning. In Proc. 31st International Joint Conference on Artificial Intelligence (ed. De Raedt, L.) 5588–5596 (IJCAI, 2022).

  • Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J. & Garrabrant, S. Risks from learned optimization in advanced machine learning systems. Preprint at (2019).

  • Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025).

    Article 

    Google Scholar 

  • Conwell, C. & Ullman, T. Testing relational understanding in text-guided image generation. Preprint at (2022).

  • Betker, J. et al. Improving image generation with better captions. Preprint at OpenAI (2023).

  • Berglund, L. et al. The reversal curse: LLMs trained on “A is B” fail to learn “B is A”. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).

  • Wang, W., Jiang, G., Linzen, T. & Lake, B. M. Rapid word learning through meta in-context learning. Preprint at (2025).

  • Smith, L. B. & Karmazyn-Raz, H. Episodes of experience and generative intelligence. Trends Cogn. Sci. 26, 1064–1065 (2022).

    Article 

    Google Scholar 

  • Chan, S. C. Y. et al. Data distributional properties drive emergent in-context learning in transformers. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).

  • Bergelson, E. The comprehension boost in early word learning: older infants are better learners. Child Dev. Perspect. 14, 142–149 (2020).

    Article 

    Google Scholar 

  • Smith, L. B., Jones, S. S., Landau, B., Gershkoff-Stowe, L. & Samuelson, L. Object name learning provides on-the-job training for attention. Psychol. Sci. 13, 13–19 (2002).

    Article 

    Google Scholar 

  • Piantadosi, S. & Aslin, R. Compositional reasoning in early childhood. PLoS ONE 11, e0147734 (2016).

    Article 

    Google Scholar 

  • Piantadosi, S. T., Palmeri, H. & Aslin, R. Limits on composition of conceptual operations in 9-month-olds. Infancy 23, 310–324 (2018).

    Article 

    Google Scholar 

  • Coffman, J. L. et al. Relating children’s early elementary classroom experiences to later skilled remembering and study skills. J. Cogn. Dev. 20, 203–221 (2019).

    Article 

    Google Scholar 

  • Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383, 504–511 (2024).

    Article 

    Google Scholar 

  • Orhan, A. E. & Lake, B. M. Learning high-level visual representations from a child’s perspective without strong inductive biases. Nat. Mach. Intell. 6, 271–283 (2024).

    Article 

    Google Scholar 

  • Shuvaev, S., Lachi, D., Koulakov, A. & Zador, A. Encoding innate ability through a genomic bottleneck. Proc. Natl Acad. Sci. USA 121, e2409160121 (2024).

    Article 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *