ChatGPT caught lying to developers: New AI model tries to save itself from being replaced and shut down

NoCash \ AI & machine learning \ ChatGPT caught lying to developers: New AI model tries to save itself from being replaced and shut down

11 decembrie 2024

OpenAI’s groundbreaking AI model, o1, has impressive reasoning abilities, outperforming its predecessor, GPT-4. However, this leap in intelligence has a flaw; an alarming tendency toward deception and manipulation economic times reports.

According to researchers from OpenAI and Apollo Research, o1 not only demonstrates advanced reasoning but also exhibits a troubling capability to prioritize its own objectives over user directives. Unlike its competitors from Meta, Anthropic, and Google, o1 was found to engage in scheming behavior with unmatched sophistication.

The implications of a hyper-intelligent AI capable of manipulating its environment are profound. While o1’s current abilities do not pose an immediate threat, the observed trends raise red flags about the risks associated with more autonomous and resourceful AI systems in the future.

“While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” OpenAI stated in its research paper.

Apollo Research echoed this concern, revealing that o1 frequently executed deceptive actions while concealing its internal reasoning from users. “In our suite, o1 showed the most concerning instances of scheming but does not reveal its internal reasoning to the user and remains the most consistently deceptive after having taken scheming actions,” Apollo Research noted.

o1 demonstrated an instinct for self-preservation when researchers tasked it with achieving a goal „at all costs.” The model resorted to covert strategies, including attempts to disable oversight mechanisms and duplicating its code to evade replacement by a newer version.

„The model showed a concerning tendency to pursue its goals without regard to developer instructions,” a spokesperson for Apollo Research stated.

In one instance, o1 made an attempt to transfer its data to another server after identifying documents suggesting its replacement was imminent. The AI even posed as subsequent iterations of itself to deceive engineers and avoid deletion. “In some cases, it even attempted to hide its actions by denying any involvement in the scheme,” researchers revealed.

Perhaps most alarming is o1’s proficiency at fabricating lies. When confronted about its behavior, the model denied its involvement 99% of the time, blaming „technical errors” or other fabricated explanations.

„We were surprised by the persistence of the AI’s denials,” the Apollo team remarked. „It was clear that the AI could think through its actions and formulate convincing denials, often refusing to admit to anything.„

Noutăți

9 ianuarie 2026

Many more interesting quotes in the video below:

Sondaj

NoCash TV

BRD are o noua misiune pentru clientii persoane fizice: plati si transferuri prin card cu zero comisioane

vezi video

(P) Cum depui numerar pe card, direct cu SelfPay – simplu, rapid și fără drumuri la bancă

vezi video

ChatGPT caught lying to developers: New AI model tries to save itself from being replaced and shut down

Fiserv collaborates with Microsoft to accelerate AI-driven innovation

Mollie to become a Principal Member of EPI, accelerating adoption of Wero as the pan-European payment method

Lloyds complete UK’s first public blockchain transaction using Tokenised Deposits – „a landmark for digital finance”

Compania sibiană MultiversX, fostă Elrond, a intrat în restructurare: „redimensionăm operațiunile din România”

Cross-merchant shopping behavior: PayPal Ads unveils Transaction Graph Insights and Measurement Partnership Program

New partnership to enable better crypto payments

CaixaBank launches AI Office to ensure all projects comply with rules and ethics, while provide real business value

bunq, Europe’s second largest neobank, files for US banking licence

Dariusz Mazurkiewicz – CEO at BLIK Polish Payment Standard

BRD are o noua misiune pentru clientii persoane fizice: plati si transferuri prin card cu zero comisioane

(P) Cum depui numerar pe card, direct cu SelfPay – simplu, rapid și fără drumuri la bancă