Why Most AI Readiness Checklists Are Written by People Who've Never Shipped

The 2025 Stanford HAI AI Index Report counted 233 documented AI incidents in 2024 — a 56.4% year-over-year increase, the largest single-year rise the AI Incident Database has recorded.¹

Incidents are going up faster than the discipline of operating these systems.

This is the gap the AI governance industry is supposed to be closing. Instead, we have a proliferation of frameworks, none of which, on their own, are sufficient — and several of which are actively misleading when read as a complete solution. This is a practitioner’s teardown of the four that matter most.

NIST AI RMF 1.0 — the voluntary starting point

The NIST AI Risk Management Framework (AI RMF 1.0), published January 2023, defines four core functions: GOVERN, MAP, MEASURE, MANAGE.² GOVERN runs across the whole lifecycle; MAP, MEASURE, and MANAGE are applied to specific systems at specific stages.

What the RMF gets right: it makes governance a first-class function, not a checkbox at the end. “You can’t measure what you haven’t mapped” is a sharper framing than most compliance documents manage.

What the RMF can’t do, on its own: it is explicitly “voluntary, rights-preserving, non-sector specific, and use-case agnostic.”² That’s a feature if you’re NIST — and a bug if you’re a Head of AI trying to tell your board what “ready” actually means for your organization specifically. The framework doesn’t say what Level 1 vs. Level 4 maturity looks like in your stack. It gives you the axes; you still have to draw the graph.

The 2024 companion document, NIST AI 600-1, the Generative AI Profile, closes part of that gap for GenAI by enumerating twelve risks unique to or exacerbated by generative models: hallucinations, data leakage, copyright, harmful bias, disinformation/cybersecurity misuse.³ Worth reading. Still not a checklist for your specific deployment.

ISO/IEC 42001 — the one you can actually certify against

ISO/IEC 42001:2023, published December 2023, is the first certifiable AI management system standard.⁴ It uses the same Plan-Do-Check-Act lineage as ISO 9001 and ISO 27001, which means your existing ISMS auditors already know the structure.

This is the only AI framework that produces a certificate you can put in an RFP response. If your buyer’s procurement process has a line item for third-party AI governance attestation, 42001 is today the answer.

What 42001 still requires you to figure out: what your risk management process actually does when a model goes sideways in production. The standard says you need risk management, impact assessment, lifecycle management, and supplier oversight. It doesn’t say what “good” looks like. You bring the substance; 42001 brings the structure. That’s not a criticism — it’s an honest description of what a management system standard is.

EU AI Act — binding law with a staged runway

Regulation (EU) 2024/1689, the AI Act, entered into force August 1, 2024. Full applicability: August 2, 2026. Prohibited practices already enforceable as of February 2, 2025.⁵

It’s the only one of these four that is actual law, not guidance, and it applies extraterritorially — if your system outputs are used in the EU, you’re in scope regardless of where you’re headquartered.

The risk-tier model is: unacceptable (prohibited outright), high-risk (heavily regulated), limited-risk (transparency only), and minimal-risk (unregulated). The high-risk tier is the operative one for most regulated-industry deployments and imposes obligations on risk management, data governance, technical documentation, human oversight, EU database registration, conformity assessment, and ongoing monitoring for serious incidents.⁵

Two practitioner observations. First: the high-risk classification rules in Annex III are not ambiguous — if you’re in biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, or administration of justice, you know. You don’t get to argue your way out. Second: the August 2, 2026 deadline is closer than it looks. Conformity assessments are not same-week projects.

SR 11-7 — the 15-year-old standard banks still struggle to meet

US banking has had SR 11-7, the joint Federal Reserve / OCC Supervisory Guidance on Model Risk Management, since April 2011.⁶ Any bank, bank holding company, or financial market utility operating in the US is already required (in practice) to comply with it for any quantitative model used in decision-making.

AI/ML models in a bank are models. SR 11-7 applies. The implications:

Independent validation. “Effective challenge” — people not involved in model development must independently validate it. Your in-house team cannot be its own auditor. This requirement predates the AI boom by more than a decade, and most AI teams today cannot describe how they meet it.
Scaled rigor. SR 11-7 is explicit that requirements scale with the institution’s size and model risk exposure.⁶ A fintech with twenty customers doesn’t need the program of a money-center bank. But zero is not a program.
Documentation and reporting. SR 11-7’s documentation bar is closer to “a skilled engineer who has never seen this model could reproduce the development process” than to “we have a README.”

Fintechs routinely tell me they want an AI readiness audit because “nothing exists for AI.” SR 11-7 has existed for 15 years. The real issue is that most AI teams haven’t been asked to meet it yet. When they are — and they will be — the gap will be wider than their timelines assume.

What the frameworks do not address

Every one of these documents is defensible on its own terms. Put them next to a real production incident, though, and gaps appear fast.

In February 2024, the BC Civil Resolution Tribunal ruled that Air Canada was liable for information its customer-service chatbot provided to a grieving customer, Jake Moffatt, about bereavement fares.⁷ The chatbot confidently stated that the airline’s bereavement policy allowed retroactive refund applications within 90 days. This was false — Air Canada’s actual policy required advance booking. When Moffatt asked for the refund, Air Canada refused. The tribunal found negligent misrepresentation and ordered Air Canada to pay.

Air Canada’s defense was that the chatbot was a “separate legal entity” responsible for its own statements. The tribunal rejected that, finding that the airline bore responsibility for all information on its website regardless of delivery mechanism. The ruling’s operative sentence:

“I find Air Canada did not take reasonable care to ensure its chatbot was accurate.”⁷

Read the NIST RMF against that ruling. “Reasonable care” isn’t in the framework. Neither is it in ISO 42001 or the EU AI Act, at least not as a test the framework can answer. A customer-facing LLM that hallucinates policy details is a process documentation failure — someone needed to own the question “how do we know what our chatbot is saying?” and no one did. A readiness audit asks that question in the intake stage. A checklist does not.

Twelve years before Air Canada, Knight Capital Group lost over $460 million in approximately 45 minutes on August 1, 2012, because a manual deployment missed one server out of eight and a decade-old dormant test algorithm called “Power Peg” woke up and systematically bought high and sold low across 154 stocks — 4 million executions, 397 million shares, and a net long position of about $3.5 billion plus a net short position of about $3.15 billion before the runaway code could be stopped.⁸

The Knight Capital failure modes are not “AI risk.” They are:

Unowned legacy code — no one had responsibility for decommissioning Power Peg.
Non-atomic deployment — partial rollout across production plus a lingering legacy flag.
No kill-switch / observability — 45 minutes to identify and stop the runaway.

Every one of those is also on the critical path for modern AI systems. The failure pattern is older than the technology. If your 2026 AI governance program does not have an answer for each of these three items, your program has not yet caught up to a 2012 lesson from trading infrastructure.

What an audit is for

Frameworks tell you the categories. They do not tell you where your organization sits within them. That’s the job of a scored, evidence-backed audit — executed by someone who has shipped systems, not someone who has only written policies about shipping them.

A defensible AI readiness audit should:

Map findings to recognized frameworks. Every score should crosswalk to NIST AI RMF functions, ISO 42001 controls, EU AI Act high-risk requirements, and — where applicable — SR 11-7 expectations. The buyer’s board wants to see the framework names, not a bespoke rubric.
Cite evidence, not opinion. Every score is backed by specific artifacts reviewed and specific conversations held. Rationale is written out. The buyer can dispute a score on its merits, not on “it feels harsh.”
Differentiate between structure and substance. A policy that reads well is not a system that holds under load. The audit has to test for the second.
Produce something operational. The output should tell the buyer what to fix next week, not what to read next quarter.

This is why readiness audits have to be written by people who have shipped the systems they’re auditing. Not because practitioner credentials are aesthetically superior — but because the failure modes the frameworks don’t name are the failure modes that cause actual losses, and you only recognize them when you’ve shipped through them.

Checklists are downstream of experience. Experience is not downstream of checklists.

Sources

Stanford Institute for Human-Centered AI, The 2025 AI Index Report, Responsible AI chapter (2024 incident data). https://hai.stanford.edu/ai-index/2025-ai-index-report/responsible-ai ↩
National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, January 26, 2023. https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf ↩ ↩²
National Institute of Standards and Technology, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 26, 2024. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf ↩
International Organization for Standardization / International Electrotechnical Commission, ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system, December 2023. https://www.iso.org/standard/42001 ↩
European Union, Regulation (EU) 2024/1689 (Artificial Intelligence Act), in force August 1, 2024. https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai ↩ ↩²
Board of Governors of the Federal Reserve System / Office of the Comptroller of the Currency, SR 11-7: Supervisory Guidance on Model Risk Management, April 4, 2011. https://www.federalreserve.gov/boarddocs/srletters/2011/sr1107.htm ↩ ↩²
Moffatt v. Air Canada, 2024 BCCRT 149 (British Columbia Civil Resolution Tribunal, February 14, 2024). Mr. Moffatt was awarded $650.88 in damages, $36.14 pre-judgment interest, and $125 in tribunal fees. Coverage: CBC News, February 15, 2024. https://www.cbc.ca/news/canada/british-columbia/air-canada-chatbot-lawsuit-1.7116416 ↩ ↩²
U.S. Securities and Exchange Commission, In the Matter of Knight Capital Americas LLC, Administrative Proceeding File No. 3-15570, Release No. 34-70694, October 16, 2013. https://www.sec.gov/files/litigation/admin/2013/34-70694.pdf ↩