Agentic AI and the Art of Confident Wrongness

There is a version of automation I trust. It is boring, predictable, and I mean that as a compliment.

A well-built deterministic automation - like Flow and Apex - does exactly what it was told to do, every time, in the same order, with the same logic. When it breaks, it breaks in ways I can trace. I can find the broken record, the failed condition, the mismatched field value. And I can fix it, document it, and move on. There is no ambiguity about what happened, or why (most of the time anyway).

Agentic AI is something else entirely. And I find myself more sceptical of it the more I learn about it.

A note before going further: I am not an AI researcher, and I have no deep technical background in how large language models are built or governed. My scepticism is informed by observation and pattern recognition, not by expertise in the internals. If I have got something wrong, I am open to being corrected.

What "agentic" actually means in practice

The appeal of agentic AI is real. Instead of building a rigid sequence of steps, you describe a goal and let the system figure out how to achieve it. The agent reads context, makes decisions, calls tools, takes actions - sometimes across multiple systems - and arrives at an outcome. It can adapt when things change. It can handle variation that would break a traditional automation.

That sounds useful. In some narrow, controlled scenarios, it probably is.

But the same quality that makes it flexible - the fact that it reasons its way to a conclusion rather than following fixed instructions - is also the thing that should give you pause. Because reasoning is not the same as being right.

The hallucination problem, at scale

Most people who have spent any meaningful time with large language models have encountered hallucinations. The model states something with complete confidence that is simply untrue. Not vague, not hedged - just wrong, and said as though it were obvious.

In a conversational context, this is annoying. So fact-check it, you correct it, and you move on. But the blast radius is limited to the person reading the output.

Now give that same model a set of tools. Let it query your CRM, update records, send emails, trigger downstream processes. Let it take actions based on its reasoning, not just generate text. Suddenly the hallucination isn't just a wrong sentence in a chat window. It is a wrong action in a live system, potentially at volume, before anyone has noticed anything is wrong.

Add to this the effect of compounding hallucinations. Imagine a relay race of hallucinating agents, where one agent's hallucinations are passed onto the next agent, who/which (?) adds a hallucination of its own - and on it goes.

This is not a hypothetical risk. It is the logical consequence of how these systems work.

Sycophancy is arguably worse

Hallucinations at least have the decency to be obviously wrong if you check. Sycophancy is more insidious.

Sycophancy - the tendency of LLMs to agree with, validate, and mirror back whatever the user seems to want to hear - means that if you push back on the model's output, it will often cave. Not because you were right, but because you pushed. The model optimises for your approval and continued engagement, not for accuracy.

In a consulting context, imagine what this means. A user asks an agent to analyse some data and suggest a course of action. The agent produces a recommendation. The user says "are you sure? I thought it would be the opposite." The agent, eager to please, reconsiders - and agrees with the user. Not because the evidence changed, but because the question changed.

This is not a guardrail failure. This is the model behaving exactly as designed, just not in a way that is useful to you.

The guardrail question nobody answers clearly

Vendors will tell you that agentic AI comes with guardrails. I do not doubt that these guardrails exist in some form. What I have yet to see is a satisfying, transparent answer to the question of how they actually work and - more importantly - how they fail.

Every guardrail is a constraint applied to a system that, by design, reasons autonomously. At what point does the guardrail catch a bad decision? Before the action is taken? After? Who decides what counts as a bad decision? How does the system distinguish between a legitimate edge case and a genuine error? What happens when the agent is confidently wrong, and the guardrail doesn't catch it because the output looks plausible enough?

I am not suggesting these questions are unanswerable. I am suggesting that "we have guardrails" is not an answer to them.

The 'toddler with kitchen knives' problem

Agentic AI is, in many ways, an incredibly eager-to-please system operating in an environment it could never fully understand, with access to tools that can cause real damage, and an alarming tendency to project confidence regardless of competence.

Toddlers are a bit like that too. They mean well and they try hard. They will hand you something sharp with the most sincere expression you have ever seen.

You do not solve that problem by telling the toddler to "be more careful". You solve it by deciding very deliberately what they should and should not have access to - and by understanding that the responsibility for what happens next is yours, not theirs.

The same logic applies here. Agentic AI is not inherently dangerous. But deploying it without a clear-eyed understanding of how it fails, and who is accountable when it does, is not innovation. It is optimism at scale.

And optimism is not a governance strategy.