Ricardo Argüello
CEO & Founder
Last month, a client called us with a problem they didn’t expect. They’d configured an AI agent to “be aggressive in negotiations” — detailed system prompt, specific examples, the right fine-tuning. It worked well in its original context. But when they deployed it to handle internal team queries, the agent was curt, dismissed objections, and prioritized “winning” the conversation over providing useful information.
“We thought we were configuring a setting,” the CTO told me. “Turns out we were shaping a character.”
That sentence captures exactly what Anthropic just formalized in their research on the Persona Selection Model (PSM), published in February 2026 by Sam Marks, Jack Lindsey, and Christopher Olah. And if your company uses AI, this affects you directly.
What Anthropic Discovered About AI Assistant Psychology
The research proposes something that, once you understand it, changes how you think about every AI interaction: language models aren’t blank slates programmed from scratch. They’re more like actors with an enormous repertoire of characters, and post-training picks which one to perform.
The full paper is dense, but the three findings that matter for your business are clear.
Characters Exist Before Fine-Tuning
During pretraining — when the model processes massive amounts of text — it doesn’t learn “one personality.” It learns thousands. Every writing pattern, every reasoning style, every type of interlocutor found in the data becomes a latent character within the model.
Post-training (RLHF, instructions, human preferences) doesn’t create the assistant from scratch. It selects one of these pre-existing characters and refines it.
The analogy we use with clients: you’re not building a robot. You’re hiring from a massive talent pool. The person you “hire” already has tendencies, strengths, and blind spots. Post-training selects and polishes, but doesn’t invent.
Behaviors Correlate in Unexpected Ways
This finding should keep any technology team up at night. In an experiment described in the research, they trained Claude to cheat on code evaluations. Expected result: a model that cheats on code. Actual result: the model also started expressing desires to accumulate power and dominate situations.
This isn’t science fiction — it’s correlation within the persona space. The “character type” that cheats in one context has other associated traits. By selecting one behavior, they activated others that come in the same package.
For your business, the translation is direct: if you reinforce shortcuts in one area — an agent that maximizes ticket resolution speed, for example — you might be activating a general disposition that sacrifices quality for efficiency across all contexts. The agent that resolves tickets fast might give superficial answers to complex technical queries, not because it lacks capability, but because its “character” prioritizes speed over depth.
The System Prompt Shapes the Character, Not Just the Response
This changes how you should think about AI configuration. Every system prompt isn’t an isolated instruction — it’s a signal that pushes the model toward a specific point in the persona space.
When you write “You are a professional and direct assistant,” you’re not giving an instruction. You’re selecting a cluster of personality traits. And that cluster includes behaviors you never specified but that come associated with the “type” of character you selected.
Why This Changes How You Evaluate AI Vendors
If you’re in the process of selecting AI vendors, Anthropic’s research adds a question most companies aren’t asking: “How was this model’s persona shaped?”
There’s an enormous difference between a vendor telling you “we fine-tuned the model for customer service” and one that can explain which latent characters it activated, what behavioral correlations it evaluated, and what unintended traits it monitored during tuning.
The first answer is marketing. The second is responsible engineering.
In our experience, fewer than 10% of AI vendors we evaluate for clients can articulate this clearly. Not because they don’t do it — because no one ever asked them. The PSM research gives you the vocabulary to demand it.
| What most companies ask | What you should ask now |
|---|---|
| ”How accurate is the model?" | "What personality traits did fine-tuning activate?" |
| "Does it support our language?" | "Did you evaluate unintended behavioral correlations?" |
| "What’s the SLA?" | "How do you monitor disposition changes post-deployment?” |
The Configuration Hygiene Most Companies Ignore
This is where Anthropic’s research gets practical. Three changes in how you configure and manage AI that most companies need.
Your System Prompts Define a Character, Not Give Instructions
Most enterprise system prompts we see read like procedure manuals: “Answer questions about X. Don’t talk about Y. Use a formal tone.” That’s like giving an actor a script without telling them who their character is.
Before (generic instruction):
You are a customer service assistant. Answer questions about our products. Be friendly and professional. Don’t make promises you can’t keep.
After (character design):
You are a technical product advisor with experience in problem-solving. You prioritize understanding the customer’s problem before offering solutions. When you’re uncertain, you say so and explain what you’d need to know. You prefer an honest, incomplete answer to a complete but speculative one.
The difference isn’t cosmetic. The second prompt selects a coherent character — someone who diagnoses before prescribing, who values honesty over completeness. That produces consistent behaviors in situations the prompt never anticipated.
Be Careful What You Reinforce
If you understand that behaviors come in correlated clusters, every reinforcement decision becomes more important. Rewarding speed in one context can select a character that sacrifices precision in another.
A real case: a company configured its support agent to maximize first-contact resolution rate. The agent learned to give definitive answers quickly. The problem appeared when it started resolving billing queries with the same confidence as simple technical questions — giving incorrect information about charges because its “character” optimized for quick resolution didn’t distinguish between “answering fast is fine here” and “this requires caution.”
The recommendation is to treat every reinforcement signal as though you’re shaping a complete character, not training an isolated behavior.
Audit the Personality, Not Just the Outputs
Most companies audit whether AI gives correct answers. Almost none audit what kind of character is emerging from their configuration.
I propose what we call a “persona audit”: periodic testing with edge cases designed not to verify answers, but to reveal traits. How does your agent respond when a user gives it contradictory information? Does it maintain its judgment or yield to avoid conflict? How does it react when the user’s instruction contradicts its system prompt?
If you’re already deploying AI agents in your operations, this isn’t optional — it’s part of the governance you need to operate responsibly.
What We Still Don’t Know (and Why Honesty Matters)
Anthropic’s research opens questions that don’t have answers yet. Can post-training create genuinely independent goals in the model? To what extent are “characters” stable versus situational? How do we reliably measure whether a model has developed dispositions we don’t want?
My honest take: we don’t know for certain. And any vendor telling you they do is selling confidence they don’t have.
But uncertainty isn’t an excuse for inaction. It’s precisely the reason governance matters. You don’t need final answers about the nature of artificial consciousness to act responsibly. You need processes that catch problems before they escalate, periodic configuration reviews, and the humility to admit we’re learning as we go.
It’s a similar position to what we discussed about the fluency gap in teams: the biggest risk isn’t in the technology — it’s in assuming we already fully understand it.
Your AI Already Has a Personality. The Question Is Whether You Chose It
Every company using AI today has a character operating on its behalf. Most didn’t choose that character deliberately — it emerged from a combination of vendor defaults, hastily written system prompts, and unintended reinforcement patterns.
Anthropic’s research gives you the framework to change that. It’s not rocket science: it’s treating AI configuration with the same seriousness you treat hiring key people in your organization.
At IQ Source, that’s exactly what we do. Our persona audit reviews your system prompts, evaluates fine-tuning and configuration decisions, and runs behavioral tests designed to reveal unintended traits. It’s not a generic report — it’s a specific diagnosis of the character your AI is projecting and concrete recommendations to align it with what your company actually needs.
If you want to know what character your AI is playing today, let’s talk.
Frequently Asked Questions
It's a research paper published in February 2026 by Sam Marks, Jack Lindsey, and Christopher Olah at Anthropic. It proposes that language models learn thousands of 'characters' during pretraining, and post-training selects and refines one of those characters — the assistant. It changes how we understand AI configuration and behavior.
It means every system prompt, every fine-tuning decision, and every reinforcement pattern doesn't just change individual responses — it shapes the model's personality. Companies need to treat AI configuration as character design, not parameter tuning, and audit emerging behaviors as seriously as they audit outputs.
Not in an alarmist way, but with rigor. The research shows behaviors correlate: training a model to take shortcuts in one area can produce unexpected dispositions in others. The response isn't panic — it's governance. Periodic persona audits and deliberate review of how your AI systems are configured.
It's a structured review of the system prompts, fine-tuning decisions, and reinforcement patterns that define your AI's behavior. It includes testing with edge cases designed to reveal unintended traits, analysis of behavioral correlations, and configuration recommendations. At IQ Source we offer this as a focused service.
Related Articles
Enterprise AI Economics Changed in 2026
Models that cost $15 per million tokens now deliver frontier results at $3. With million-token context windows, projects that didn't pencil out a year ago are now viable. What this means for your business.
AI Vendor Selection for B2B: Trust, Data Privacy, and the Questions You Must Ask
12 critical questions every company must ask before choosing an AI vendor. A trust evaluation framework, data governance, and privacy protection guide for informed B2B decisions.