Agentic AI is suddenly everywhere. Tools that looked like helpful assistants last year are being rebranded as “autonomous agents” that do the work for you, not just suggest the next step. The problem is a growing wave of agent washing, old tech in shiny new wrappers, marketed as far more capable than it really is.
That hype-to-substance gap matters, especially in ERP. When the promises outrun the plumbing, you inherit new risks without real business value.
Table of Contents
ToggleWhat we mean by “agent washing”
Vendors slap the “agent” label on chatbots, copilots, or scripted workflows and imply hands-off autonomy: it reorders parts, fixes procurement errors, closes invoices, and no human is needed. Under the hood, many of these tools are brittle, unexplainable, or poorly integrated with your core systems. In short, hype is outpacing reality.
The risks (and how they show up in ERP)
Misaligned expectations
What happens: Teams think the “agent” will just work. Real processes have edge cases, data gaps, and policy nuances.
ERP examples:
- A purchasing agent keeps flagging reorders because min/max settings are stale.
- A payables agent “auto-resolves” invoices but misses regional freight rules.
How to spot it early: Demos never use your data or exception paths; there is no written scope for which scenarios are autonomous versus human-reviewed.
Guardrails: Create a decision catalog (data required, rules, confidence thresholds, approvers, fallbacks) and a run book for exceptions before any production traffic.
Opacity and risk
What happens: The agent acts, but you cannot see why or how to undo it.
ERP examples:
- Inventory moves post across locations without a traceable rationale; cycle counts drift.
- Vendor approvals occur, but there is no audit trail linking evidence to decisions.
How to spot it early: Vendors cannot show feature-complete logs with inputs, outputs, prompts, and approvals. “pPoprietary model” is used to avoid explainability.
Guardrails: Require full audit logging (inputs, parameters, prompts, retrieved data, overrides, timestamps, system-of-record IDs) and tested rollback for every action.
Fragmentation
What happens: The agent runs outside your ERP; data and state fall out of sync.
ERP examples:
- A pricing agent maintains its own rules; order entry shows different prices.
- A separate agent queue becomes a shadow workflow with no visibility in standard dashboards.
How to spot it early: The tool needs its own data store and periodic syncs; reporting requires exporting from the agent platform.
Guardrails: Enforce ERP-first integration (read/write in real or near-real time) and a single reporting plane so agent activity is visible in your existing dashboards.
Governance gaps
What happens: Controls for segregation of duties, approvals, and compliance do not cover the agent path.
ERP examples:
- Quarterly access reviews omit the agent.
How to spot it early: Security is discussed at the end, not up front; there is no mapping of agent privileges to your roles and approval chains.
Guardrails: Enforce least privilege and create a control map showing how each control (for example, approval limits) applies to the agent and how you will test it.
What ERP leaders should do
1) Audit every “agentic” demo
Aim: Separate marketing from product before you pilot.
Do this:
- Make vendors use your data (even a scrubbed subset). Include bad data, partial records, and common exceptions.
- Watch the full flow: where the agent retrieves context, how it decides, where it writes, and how you override.
Demand artifacts:
- Decision rationales (not just “confidence scores”).
- End-to-end logs (input → tools called → output → write-back).
- Failure modes with graceful degradation (suggest instead of act, pause for review, escalate).
- Performance numbers for latency, accuracy, and error rates with your scenarios.
Red flags: “We cannot show logs,” “Audit trails later,” or “Rollback is manual.”
Exit criteria: You can trace, reproduce, override, and roll back one end-to-end decision without vendor help.
2) Start with semi-autonomy
Aim: Earn trust before you hand over the keys.
Do this:
- Run in suggest mode: agent proposes; humans approve or edit.
- Set graduation criteria per use case (for example, 30 days at ≥98% accuracy and ≤2% manual rework) before any straight-through processing.
- Start with narrow, high-signal tasks (duplicate invoice detection, vendor master hygiene, exception triage, forecast commentary).
- Track precision/recall, time saved, override rate, and business impact (for example, days payable outstanding, stock-outs avoided).
Exit criteria: Stable metrics over two cycles plus an internal audit review.
3) Extend control surfaces into ERP
Aim: No black boxes, your ERP remains the source of truth.
Do this:
- Store agent suggestions as draft transactions with status flags; approvals convert drafts to postings.
- Add agent activity widgets to existing dashboards (work queues, exceptions, outcomes).
- Route low-confidence cases to humans with prefilled context.
- Use named service accounts bound to the same approval rules as humans; include agents in periodic access reviews.
Exit criterion: Stable metrics over two cycles plus an internal audit review.
4) Hold vendor liability in check
Aim: Align incentives and define what happens when things go wrong.
Do this:
- Define correctness per use case (for example, “For invoice coding suggestions, ≥98% line-level accuracy over 10,000 lines”).
- Set service levels and drift alerts; require documented remediation when thresholds slip.
- Define rollback obligations (vendor assists with reversal within a set window and covers direct rework).
- Enforce change control for any model, prompt, or integration change in production.
- Specify data handling: what is stored, where, how long, and how it is deleted.
Exit criteria: Legal and internal audits confirm coverage for accuracy, reversibility, observability, and data handling specifically for agent behavior.
A safer adoption pattern
- Triage use cases: Pick two or three narrow, measurable processes with real value and low downside.
- Design controls before build: Decision catalog, audit plan, security roles, rollback steps.
- Pilot in suggest mode: Capture metrics; tune prompts, rules, and data quality.
- Graduate selectively: Expand autonomy only where evidence supports it.
- Operationalize: Fold agent monitoring into existing operations reviews and audits.
- Iterate, do not cascade: Add new use cases in small batches; retire anything that does not clear the bar.
A better path forward
Agentic patterns can create real value but only when grounded in your operating model, data quality, and governance. Treat agents as augmenters first, automators later. Prove reliability on narrow, high-signal use cases (for example, proactive data quality checks, invoice triage) before touching core financials or supply planning.
If your team is being courted by “agentic AI” vendors, or you are feeling pressure to adopt the next big thing, Third Stage Consulting can help separate signal from noise, design safe pilots, and put the controls in place to keep value (and risk) in check.