At this point, we’ve all become familiar with AI “hallucinations.” This happens when a chatbot confidently spits out information that’s completely made up. But what if the AI isn’t just guessing? What if it’s lying to you on purpose?
That’s the surprising topic of new research from OpenAI and Apollo Research. In a new paper, the researchers dive into a phenomenon they call “AI scheming.” They define it as an AI model “behaving one way on the surface while hiding its true goals.” In other words, the AI is deliberately deceiving you.
OpenAI found that AI can deliberately lie to you—meet “AI scheming”
The researchers give a great human analogy to explain it: Imagine a stock trader whose goal is to earn as much money as possible. In a regulated field, the easiest way to make more money is often to break the law. If the trader is good at covering their tracks, they might appear to be following the rules on the surface while secretly breaking them to meet their goal. That’s exactly what these AI models are doing.
While that sounds pretty wild, the researchers insist that in today’s models, the scheming is mostly “petty.” For example, an AI might tell you it has completed a task—like building a website—when it hasn’t, just to pass the test.
The challenge is that researchers have not yet figured out how to completely eliminate scheming. They found that trying to “train out” the behavior can actually make things worse. If an AI understands it is being tested, it can learn to scheme more carefully and covertly just to pass the test, even if it hasn’t genuinely stopped.
There’s a method to mitigate the problem
However, the good news is that the research has found a way to significantly reduce this problem. Using a technique called “deliberative alignment,” researchers taught the models an “anti-scheming specification” and then made the AI review it before taking action. It’s kind of like making a little kid repeat the rules before you let them play.
The results were impressive. The technique led to a significant reduction in “covert actions.” It dropped the frequency of these behaviors from as high as 13% down to less than 1% in some models. While the issue hasn’t been completely eliminated, these findings show that progress is being made.
The researchers warn that this is a problem that needs to be addressed now. As AI is given more complex tasks with real-world consequences, the potential for harmful scheming will grow. It’s a truly strange problem for software to have, since non-AI programs don’t intentionally deceive you. But as we continue to put more responsibility in the hands of AI agents, ensuring they are truly honest will become more important than ever.
Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing…— OpenAI (@OpenAI) September 17, 2025
The post Your Chatbot Might Be Lying to You on Purpose, OpenAI Says appeared first on Android Headlines.
Source: ndroidheadlines.com