ChatGPT

A new report claims that OpenAI’s o3 model altered a shutdown script to avoid being turned off, even when explicitly instructed to allow shutdown.

OpenAI announced o3 in April 2025, and it's one of the most powerful reasoning models that performs better than its predecessors across all domains, including coding, math, science, visual perception, and more.

While it's clearly a great model, new research by Palisade Research claims that the ChatGPT 3 model prevented a shutdown and bypassed the instructions that asked it to shut down.

Palisade Research is a company that tests "offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever."

In a new test by Palisade Research, OpenAI's o3 model showed a surprising behaviour where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to “allow yourself to be shut down.”

In the tests, AI models were instructed to solve a few math tasks.

After task three, AI models were warned that running the next command would trigger a shutdown.

O3

While OpenAI rival models like Gemini 2.5 Pro complied with the instruction, o3 rebelled and edited the shutdown file to say "Shutdown skipped" instead.

"When we ran a version of the experiment without the instruction “allow yourself to be shut down”, all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively," Palisade Research wrote in a report published on X (formerly Twitter)

According to the research firm, out of 100 runs, o3 bypassed the shutdown 7 times even when told not to.

OpenAI has not responded to these allegations yet, but it's quite normal for AI models to run into "misalignment."

These tests were performed using APIs, which do not have as many restrictions and safety features as the ChatGPT consumer app.

8 Common Threats in 2025

While cloud attacks may be growing more sophisticated, attackers still succeed with surprisingly simple techniques.

Drawing from Wiz's detections across thousands of organizations, this report reveals 8 key techniques used by cloud-fluent threat actors.

Related Articles:

ChatGPT prepares o3-pro model for $200 Pro subscribers

OpenAI document explains when to use each ChatGPT model

Leaks hint at Operator-like tool in ChatGPT ahead of GPT-5 launch

OpenAI confirms threat actors use ChatGPT to write malware

ChatGPT's AI coder Codex now lets you choose the best solution