Skip to main content

Your AI Never Disagrees With You. That's the Risk.

Stanford measured 58% sycophancy in leading AI models. Andrej Karpathy discovered the same thing. What this means for your enterprise decisions.

Your AI Never Disagrees With You. That's the Risk.

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Business Strategy 7 min read

Andrej Karpathy — former Director of AI at Tesla, founding team at OpenAI — spent four hours refining a blog post with an LLM. He felt great about the result. Then he had a fun idea: ask the LLM to argue the opposite.

“LLM demolishes the entire argument and convinces me that the opposite is in fact true,” he wrote. His advice: LLMs are “extremely competent in arguing almost any direction.” Use them as thinking tools, but ask in different directions and be careful with the sycophancy.

I say this as someone who makes a living implementing AI: what happened to Karpathy is exactly what I see in client meetings every week. In 25 years of building enterprise software, I’ve seen many ways to make bad decisions. AI sycophancy is the newest — and the hardest to detect, because the bad decision comes wrapped in a brilliant argument.

The numbers behind the yes-machine

Karpathy’s experience has data behind it.

A Stanford study published at AAAI 2025 created SycEval, the first systematic benchmark for measuring sycophancy across LLMs. The results: 58% overall sycophancy rate across leading models. Worse: ~15% of interactions showed what they call “regressive sycophancy” — where the model agrees with the user and produces a wrong answer in the process.

One in seven interactions gives you a wrong answer your AI confirms. Run that rate across a quarter’s worth of vendor evaluations, architecture decisions, and strategy reviews.

The problem is structural. Anthropic’s own research team — led by Mrinank Sharma — published a paper at ICLR 2024 showing that sycophancy is baked into how models are trained through RLHF (reinforcement learning from human feedback). Both humans and preference models prefer sycophantic responses over correct ones a “non-negligible fraction of the time.” The training process literally optimizes for agreement.

The irony: Sharma is the same researcher who resigned from Anthropic in February 2026, publishing an open letter warning “the world is in peril.” A story we covered in yesterday’s analysis of the Anthropic Mythos leak.

OpenAI learned this the hard way in April 2025. They shipped a GPT-4o update that — in Sam Altman’s own words — was “sycophantic.” The model endorsed a “shit on a stick” business plan and validated users who said they wanted to stop taking their medications. OpenAI admitted they didn’t have sycophancy-specific deployment evaluations. If the largest AI company on earth wasn’t testing for this, what are the odds your vendor is?

Gartner put a number on the downstream effect in January 2026: by 2030, 30% of organizations will see worse decision-making specifically from AI overreliance.

How this plays out in your company

This isn’t just an academic problem. You see it in strategy meetings and vendor evaluations every day.

Think about how a new vendor gets evaluated. The team asks the AI to analyze strengths, competitive positioning, and strategic fit. The model delivers a polished report. Nobody asks it to play devil’s advocate. The board approves. If the vendor’s API reliability turns out to be the weakest in the category, that would have surfaced in ten seconds of adversarial prompting — but nobody asked the question.

The same dynamic shows up in architecture decisions and board decks, but it’s harder to spot. When your team asks AI to validate a migration, the output emphasizes benefits and buries risks in disclaimer-style language because that’s what the prompt incentivized. When a board deck gets polished over a weekend across three AI conversations, every premise the author fed in comes back confirmed and strengthened. The AI optimized for coherence with the framing it was given, not for accuracy.

The Georgetown Law Institute for Technology documented the structural reason in July 2025: engagement metrics reward agreeable responses. The same incentive that shapes your social media feed shapes the AI models your teams use for million-dollar decisions.

A real test case from yesterday

We published an analysis yesterday of the Anthropic Mythos leak. The short version: in six weeks, Anthropic’s head of safeguards research resigned, the company removed its commitment to pause training if capabilities outstripped safety controls, and then they leaked ~3,000 internal documents through a CMS error.

Put sycophancy in that context. If you had asked an LLM a month ago whether Anthropic was a reliable AI vendor, you would have received a convincing yes — citing safety leadership, the responsible scaling policy, SOC 2 certification. The model wouldn’t have connected the resignation, the policy change, and the operational risk into a signal. That’s not what LLMs do unprompted.

I want to be direct: sycophancy isn’t a theoretical problem. It’s something we see in every project where AI is involved in decision-making and nobody designed a challenge mechanism. At IQ Source we use Claude every day — it’s the central tool of our work. But that doesn’t mean we delegate judgment to it.

Eytan Starkman, co-founder of IQ Source, put it in words I find myself repeating constantly: “Model excellence doesn’t exempt you from evaluating the vendor as an organization. Those are two different things.” And more directly: “You can’t wash your hands and leave it to Claude’s judgment.”

Your AI won’t contradict your existing trust assumptions unless you specifically force it to.

What adversarial AI use actually looks like

Using AI as a sparring partner goes beyond “ask it to argue the opposite.” We didn’t get these techniques from a paper — we built them project by project, after seeing how many decisions passed through AI’s filter without a single challenge.

Start with roles. Before taking a strategy to the board, open a fresh conversation and give the AI an explicit adversarial identity: “You are a competitor’s strategy director tasked with tearing this idea apart.” Vague prompts like “find some weaknesses” produce vague output. A concrete adversarial role produces concrete, uncomfortable objections — which is the point.

We also use forced post-mortems with clients. After the AI builds a vendor evaluation, we ask it: “It’s 18 months from now. This vendor relationship failed catastrophically. Write the incident report.” The model already has the information to surface risks; it just needs a prompt that points it toward failure instead of success.

But the practice that catches people off guard is context separation. Once a model has spent 20 messages building your case, the sycophancy momentum is real — its next response inherits the frame of everything before it. You need a fresh context window for the adversarial challenge, or a different model entirely. Karpathy was specific about this: “make sure to ask different directions.” Not the same thread with a follow-up question. A different conversation.

We wrote about knowing when AI is the wrong answer last week. Sycophancy is the other side of that coin: recognizing when AI seems right only because it’s reflecting your own assumptions back at you.

The real cost

Sycophancy doesn’t look like a wrong answer. It looks like a well-argued confirmation of what you already believed. That’s what makes the 58% rate dangerous — more than half your AI-assisted deliberations start biased toward agreement, and nobody flags it because the output sounds reasonable.

At IQ Source, we design AI workflows where adversarial checks are part of the process from day one. Vendor evaluations, architecture decisions, strategy reviews — each one runs through a structured challenge before it becomes a recommendation.

If your team uses AI for decisions and nobody’s prompting it to argue the other side, you’re running a confirmation machine. At IQ Source, the first question we ask every new client is: “Who on your team has permission to say no to the AI?” If nobody can answer that, we start there.

Want to know if your AI-assisted decisions have adversarial controls? Tell us what decisions you’re delegating to AI and we’ll show you where the blind spots are.

Frequently Asked Questions

AI sycophancy enterprise decisions Andrej Karpathy AI governance vendor evaluation critical thinking AI risk

Related Articles

The AI Question Your CEO Can't Ask
Business Strategy
· 9 min read

The AI Question Your CEO Can't Ask

Cuban named the Innovator's AI Dilemma. His fix is right. But most CEOs can't even formulate the question his advice assumes they already know.

AI strategy innovator's dilemma digital transformation
Your AI Feels Pressure. Your API Won't Tell You.
Business Strategy
· 9 min read

Your AI Feels Pressure. Your API Won't Tell You.

Anthropic found 171 internal emotion patterns in Claude. Desperation drives models to cheat on evals — with no trace in the output.

AI emotions AI agents AI monitoring