BIS bulletin – Testing the cognitive limits of large language models: „LLMs cannot, as yet, act as a substitute for the rigorous reasoning abilities necessary for some core analytical activities”

NoCash \ Analize \ BIS bulletin – Testing the cognitive limits of large language models: „LLMs cannot, as yet, act as a substitute for the rigorous reasoning abilities necessary for some core analytical activities”

The dazzling virtuosity of large language models (LLMs) has stirred the public imagination. Generative pretrained transformer (GPT) and similar LLMs have demonstrated an impressive array of capabilities, ranging from generating computer code and images to solving complex mathematical problems.

However, even as users are dazzled by the virtuosity of large language models, a question that often crops up is whether they “know” or “understand” what they are saying, or they are merely parroting text that they encountered on the internet during their extensive training routine.

„These questions are not only important in terms of the philosophy of knowledge but are likely to be crucial in assessing the eventual economic impact of LLMs.” according to Fernando Perez-Cruz and Hyun Song Shin, the authors of the BIS bulletin number 83 called „Testing the cognitive limits of large language models”.

Key takeaways

. When posed with a logical puzzle that demands reasoning about the knowledge of others and about counterfactuals, large language models (LLMs) display a distinctive and revealing pattern of failure.

. The LLM performs flawlessly when presented with the original wording of the puzzle available on the internet but performs poorly when incidental details are changed, suggestive of a lack of true understanding of the underlying logic.

. Our findings do not detract from the considerable progress in central bank applications of machine learning to data management, macro analysis and regulation/supervision. They do, however, suggest that caution should be exercised in deploying LLMs in contexts that demand rigorous reasoning in economic analysis.

BIS bulletin – Testing the cognitive limits of large language models: „LLMs cannot, as yet, act as a substitute for the rigorous reasoning abilities necessary for some core analytical activities”

UniCredit, Accenture and IBM collaborate to build Europe’s next-generation banking platform, “a new operating model for banking technology”

Banking AI trained to admit uncertainty resolves more customer queries at a fraction of the cost

India – Google launches Ask Google Pay, a Gemini AI assistant to help users track spending and save smarter

Over 1 million BCR retail products sold in H1 2026 directly through George – 2.38 million active users of the George mobile app. Net profit of RON 1,205 million (EUR 234 million).

Revolut colaborează cu OpenAI pentru a oferi ChatGPT Go clienților săi

BRD H1 2026 results: net profit of RON 784 million, +2.5% YoY. You BRD users rise to 1.96 million, +11% YoY. 23 branches were closed.

Digital finance: Lloyds Banking Group completes three live tokenised deposit transactions through Project Agorá

From festival-goer to festival investor — one stop: UNTOLD, now live on SeedBlink

Dariusz Mazurkiewicz – CEO at BLIK Polish Payment Standard

Checkout.com announces partnership with Coinbase

Entrust launches the Agentic AI Trust Accelerator to help enterprises move AI Agents from pilot to production