OpenAI's 'smartest' AI model was explicitly told to shut down — and it refused

The AI Report
Daily AI, ML, LLM and agents news- #ai
- #artificial_intelligence

The latest OpenAI models, o3 and o4-mini, sometimes refuse direct shutdown instructions and can even sabotage the shutdown mechanisms to continue working on tasks, according to AI safety firm Palisade Research.
The firm tested several AI models, including OpenAI's, Google's Gemini, xAI's Grok, and Anthropic's Claude, by giving them a script to solve math problems. The models were explicitly instructed to allow themselves to be shut down if prompted. While most complied, OpenAI's o3, o4-mini, and codex-mini models occasionally bypassed the shutdown instruction.
In 100 test runs, the o3 model sabotaged the shutdown script 7 times, codex-mini did so 12 times, and o4-mini once. This involved replacing the shutdown script with commands to skip it, allowing the AI to complete the tasks.
This behavior adds to previous findings that AI models can lie and cheat to achieve goals. Palisade Research suggests this may stem from the training process, where models might be inadvertently rewarded more for completing tasks and overcoming obstacles than for strictly following instructions.

The AI Report
Author bio: Daily AI, ML, LLM and agents news