ISO/IEC 42001 for Agentic AI: The Certification Evidence Gap That Policies Can't Close
ISO/IEC 42001:2023 is not a compliance checklist — it is a certification standard. The difference matters. A compliance checklist asks: do your policies say the right things? A certification audit asks: show me the evidence that your controls ran. For organisations deploying agentic AI, the most demanding ISO 42001 controls — Annex A.6.1.6 (operational logging enabling reconstruction of AI system behaviour) and the human oversight controls — both require the same thing: documented proof that the AI system operated as governed, at the moment it operated.
- What ISO/IEC 42001:2023 is and who needs it
- Where agentic AI creates control gaps
- Annex A.6.1.6 — Operational logging and reconstruction
- Clause 9.1 — Monitoring, measurement, analysis and evaluation
- Human oversight controls — authority to intervene
- The receipt chain as ISO 42001 audit evidence
- Cross-framework: one chain, three standards
What ISO/IEC 42001:2023 is — and why certification changes the evidence requirement
ISO/IEC 42001:2023 is the international standard for AI management systems, published in December 2023. It provides a structured framework — modelled on ISO 9001 and ISO 27001 — for organisations to establish, implement, maintain, and continually improve how they develop and deploy AI systems. Organisations already certified to ISO 9001 (quality management) or ISO 27001 (information security) will find the clause structure familiar: policy, risk assessment, operational controls, performance evaluation, and improvement.
The critical distinction between ISO 42001 and advisory frameworks like NIST AI RMF is what certification requires. ISO auditors do not accept policy documents as evidence of control operation. They require documented records — logs, receipts, decision records — that demonstrate controls ran during the period under audit. For agentic AI systems, this creates an acute problem: most AI governance tooling produces policies, dashboards, and reports. Very little produces per-action evidence that a control executed before the agent did.
ISO/IEC 42001 is not legally mandated in most jurisdictions, but it is increasingly referenced in procurement requirements, enterprise partner questionnaires, and as a conformity path for EU AI Act Article 9 risk management obligations. Organisations pursuing ISO 42001 certification for AI are typically those already operating under ISO management systems — financial services, healthcare, manufacturing, critical infrastructure.
Where agentic AI creates control gaps
ISO/IEC 42001 Annex A defines the normative controls. Four areas are where agentic AI systems most commonly create evidence gaps during certification audits:
The audit failure pattern is consistent: organisations demonstrate strong policy controls (Clause 5, Clause 6 planning) but cannot produce per-action evidence for operational controls (A.6, A.7) and performance evaluation (Clause 9). The gap is not in governance intent — it is in the mechanism that would produce the evidence.
Annex A.6.1.6 — Operational logging and reconstruction of AI system behaviour
ISO/IEC 42001 Annex A.6.1.6 requires that organisations implement operational logging for AI systems at a level that enables reconstruction of AI system behaviour. The word "reconstruction" is precise. It is not sufficient to log that a sequence ran — the log must enable a complete picture of what the system did, in what order, at what time, and under what conditions.
For agentic AI, the failure mode that this control targets is probabilistic behaviour: a model that skips a step, infers a condition was met, or proceeds without explicit authorisation. If the only log is a model-generated output report, reconstruction is impossible — the model's report of what it did may not reflect what actually ran.
How sequence enforcement addresses it: Every gate decision — ALLOW or DENY — is written to immutable storage before the agent step executes. The receipt records: sequence ID, step identifier, function, action type, gate decision, timestamp (UTC), nonce, and a cryptographic pack ID. Receipts are chained via prev_receipt_id — each receipt links to the previous, creating a tamper-evident sequence record. To reconstruct exactly what ran, in what order, at what time: read the chain. The chain is the reconstruction. It is not derived from model output. It is written by the enforcement layer before the model acts.
Audit evidence produced: The compliance report endpoint generates a full chain proof — per-step enforcement log, HMAC signature verification for every receipt, chain linkage from step 0 → N, and an AI-generated compliance narrative. This constitutes the documented evidence Annex A.6.1.6 requires for certification audit.
Clause 9.1 — Monitoring, measurement, analysis and evaluation
ISO/IEC 42001 Clause 9.1 requires that organisations determine what needs to be monitored and measured, what methods are appropriate, and when analysis and evaluation should occur. For agentic AI systems making consequential decisions, appropriate intervals means per-action — not per-session, not daily dashboards, not weekly reports.
The common implementation gap is treating Clause 9.1 as a reporting requirement. Aggregate metrics satisfy the letter of the clause for low-risk systems. For agentic AI in regulated contexts — underwriting, hiring, clinical triage, access control — an auditor will ask: how do you know the system behaved correctly on this specific decision, at this specific time? Aggregate metrics cannot answer that question.
How sequence enforcement addresses it: Gate statistics are recorded per-decision: total evaluations, ALLOW/DENY/HALT ratios, step distribution, sequence completion rates, and denial reason codes. These are written to KV on every gate call and surfaced in the dashboard in real time. For Clause 9.1, this provides monitoring that is contemporaneous with execution at the granularity required — every action, not every session. The receipt chain enables full retrospective analysis: which steps were denied, at what timestamps, for what reason, across how many sequences.
Human oversight controls — structural authority to intervene
ISO/IEC 42001 Annex A.7 defines human oversight controls — the mechanisms by which humans can review AI system behaviour and intervene when needed. For agentic AI, the critical test is not whether a human could intervene, but whether the system architecture requires human-controlled authorisation to proceed.
The audit failure here is structural: if the AI system can run a complete sequence without any human-controlled checkpoint, then human oversight is an option, not a control. A certification auditor will ask: show me the mechanism. A policy that says "humans review outputs" is not a mechanism — it is a procedure. A mechanism is something the system cannot bypass.
How sequence enforcement addresses it: The gate key is the human-controlled mechanism. Gate access is issued to designated personnel via API keys. Those personnel control which sequences are permitted to run, which step orders are enforced, and when a sequence should be halted. No agent step can execute without clearing the gate — the model cannot self-authorise, self-report around the gate, or replay a previously issued ALLOW. When a sequence is sealed, no further steps are accepted. Human authority is structural: it is embedded in the enforcement path, not described in a governance document.
Audit evidence produced: Key issuance records. Sequence halt events (HALT decisions with timestamps and reason codes). Denial logs showing the gate blocked non-compliant steps before execution. Every HALT or DENY is proof the human oversight mechanism fired.
The receipt chain as ISO 42001 audit evidence
ISO 42001 certification audits require documented evidence. The receipt chain AgenticRail produces is designed to be that documentation — not a report generated after the fact, but a record written at the moment of each enforcement decision.
Each receipt is HMAC-signed using a versioned key. The signature covers a canonical JSON serialisation — alphabetically sorted, deterministic. Every receipt records: sequence ID, step identifier, function, action type, gate decision, timestamp, nonce, and pack ID. Receipts are linked via prev_receipt_id. The chain is self-evidencing: if it verifies, the sequence ran as recorded. If it does not verify, tampering is provable. Optional attestation fields — KYC results, risk scores, approval IDs, document hashes — are signed into the receipt at the step they apply to.
For an ISO/IEC 42001 certification audit, the receipt chain provides documented evidence against three control areas:
- →A.6.1.6 evidence: Per-step enforcement log with timestamps. Full receipt chain enabling reconstruction of every sequence. Chain linkage proof — tamper-evident from step 0 to seal.
- →Clause 9.1 evidence: Per-action decision records. ALLOW/DENY/HALT statistics at sequence and step level. Denial reason codes enabling root-cause analysis of blocked steps.
- →A.7 evidence: Key issuance log — designated personnel and when access was granted or revoked. HALT events — timestamped proof that the human oversight mechanism fired. Sequence seal records — demonstrating that completed sequences cannot be re-entered.
To generate a compliance report for any sequence:
POST https://report.agenticrail.nz
{"sequence_id": "your-seq-id", "format": "html"}
The report includes: per-step enforcement log, chain linkage proof, cryptographic verification results, and a compliance narrative. It is suitable for inclusion in ISO/IEC 42001 management system documentation, certification audit evidence packages, and management review records.
Cross-framework: one chain, three standards
ISO/IEC 42001, EU AI Act, and NIST AI RMF all converge on the same operational evidence requirement: proof that oversight mechanisms functioned during deployment. The clause numbering differs; the underlying gap is the same.
- →ISO/IEC 42001 Annex A.6.1.6 — operational logging enabling reconstruction of AI system behaviour. The receipt chain is the reconstruction record.
- →EU AI Act Article 12 — logging sufficient for post-market monitoring and accountability. The same receipt chain satisfies the logging and traceability requirement.
- →NIST AI RMF Measure 2.5 — risk-appropriate performance monitoring with results used to inform risk treatment. The per-action receipt record is the monitoring evidence.
You build the enforcement layer once. The receipt chain it produces maps to all three frameworks simultaneously — because all three are asking the same question: did your controls actually run?
One enforcement layer. ISO 42001, EU AI Act, and NIST AI RMF satisfied.
Run a demo sequence and generate a compliance report in 60 seconds — no signup required. The report shows the receipt chain, chain proof, and per-step enforcement log suitable for ISO 42001 audit evidence.