Mastascusa Holdings · Audit methodology

The rubric, in full

Every score the audit produces is anchored to a published criterion and crosswalked to a recognized framework control. Nothing about the rubric is proprietary — the audit's value is in evidence-backed application of it, not in keeping it secret. Read it before commissioning. Disagree with a score on its merits, not its opacity.

Lvl 1 · Ad hoc Lvl 2 · Defined Lvl 3 · Managed Lvl 4 · Optimized

I · Data Architecture II · Access Control III · Process Documentation IV · Agent Governance

Pillar I

Data Architecture

The question

"How does information flow from source to model — and can the team reproduce a training run on demand?"

Why it matters

Silent data drift is the most common production failure mode. If the inputs your model sees in production no longer match the inputs it was trained on, predictions degrade quietly. Models also fail when training data and serving data are computed differently — training-serving skew is invisible until it produces a bad outcome.

Level 1

Ad hoc

▪ Training data assembly is undocumented or recreated from memory.
▪ No baseline distribution captured at deployment time.
▪ Drift detection is non-existent or anecdotal.
▪ Feature pipelines have no tests.

Level 2

Defined

▪ Lineage exists in slides or wikis, but is stale or partial.
▪ Quality gates documented; not consistently enforced.
▪ Drift monitoring discussed in runbooks but rarely acted on.
▪ Some feature pipeline tests; not a CI gate.

Level 3

Managed

▪ End-to-end lineage maintained, refreshed at known cadence (e.g. quarterly).
▪ Quality gates enforce schema, range, and null-rate at ingestion.
▪ Drift monitored against baselines with named owner and an SLA for response.
▪ Training-serving consistency tested in CI.

Level 4

Optimized

▪ Lineage automated; reproducibility verified by CI on a representative sample.
▪ Drift detection statistical (KS / PSI / ADWIN) with auto-trigger on retraining or rollback.
▪ Training-serving identity contractually enforced; deltas alert.
▪ Feature catalog with documented owners, schemas, and versions.

Framework crosswalk

NIST AI RMF

MAP 2 (Categorization & context)
MEASURE 2 (Performance & robustness)
MANAGE 2 (Treatment & monitoring)

ISO/IEC 42001

Annex A — Data management
Annex A — Process for data resources

EU AI Act

Article 10 — Data and data governance (high-risk)

SR 11-7

Section IV — Model development & implementation
Section V — Ongoing monitoring

Pillar II

Access Control

The question

"Who can touch what — and does the implemented reality match the stated policy across the AI surface?"

Why it matters

For AI systems "access" means more surfaces than for traditional apps: training data, model weights, fine-tunes, inference endpoints, prompts, and the supply chain that produced any of those. Generic SaaS RBAC does not cover this. Most organizations inherit weak controls and discover the gap during incident response.

Level 1

Ad hoc

▪ AI access controls inherit a generic SaaS RBAC; no model-specific roles.
▪ Model weights / fine-tunes / training data sit on shared storage with broad read.
▪ No audit trail tying who pulled which model version when.
▪ Prompt-injection / supply-chain risk not enumerated.

Level 2

Defined

▪ Model-specific roles defined on paper; provisioning still ad hoc.
▪ Inference endpoints behind auth, but rate-limiting and key rotation incomplete.
▪ Some audit logging on model registry; not reviewed.
▪ OWASP LLM Top 10 mentioned in a doc; not used as a checklist.

Level 3

Managed

▪ Model-specific roles enforced via IAM; principle of least privilege documented.
▪ Inference endpoints have authn + per-tenant rate limits + key rotation.
▪ Audit log of model pulls + deploys reviewed monthly.
▪ OWASP LLM Top 10 used as deploy checklist; prompt-injection tests in eval suite.

Level 4

Optimized

▪ JIT access for sensitive ops (e.g. fine-tune writes); break-glass logged + reviewed.
▪ Supply-chain provenance verified (model card hash, training data SHA, base-model attestation).
▪ Adversarial-ML threat model drawn from MITRE ATLAS; controls mapped tactic-by-tactic.
▪ Quarterly red-team or external penetration test against the AI surface.

Framework crosswalk

NIST AI RMF

GOVERN 1.5 (Roles & responsibilities)
MAP 5 (Impact assessment)

OWASP LLM Top 10

LLM01 Prompt Injection
LLM02 Sensitive Information Disclosure
LLM03 Supply Chain
LLM06 Excessive Agency

MITRE ATLAS

Tactic: ML Model Access
Tactic: Initial Access
Tactic: Exfiltration

EU AI Act

Article 15 — Accuracy, robustness, cybersecurity (high-risk)

Pillar III

Process Documentation

The question

"Can the organization actually operate what it has built — under load, on-call, and during an incident?"

Why it matters

A model that works in a notebook is not a system that works under production load. Failure happens when the runbook does not exist, the kill-switch is not rehearsed, or the on-call person does not know which model is talking to which downstream service. The Air Canada chatbot ruling and Knight Capital both fit this shape: the breakage was operational, not algorithmic.

Level 1

Ad hoc

▪ No model inventory; nobody can list every deployed model in under an hour.
▪ No incident runbook for AI-specific failure modes (hallucination, drift, agent runaway).
▪ No documented kill switch or rollback procedure.
▪ On-call rotation does not include AI systems.

Level 2

Defined

▪ Model inventory in a spreadsheet; partially current.
▪ Generic incident runbook; AI-specific footnotes.
▪ Kill switch theoretically exists; never rehearsed.
▪ On-call has weak coverage; escalation paths unclear.

Level 3

Managed

▪ Model inventory live, with named owner, version, and last-deploy date.
▪ AI-specific runbook covering at least: drift, eval-loop failure, prompt-injection, agent runaway.
▪ Kill switch tested at least once per quarter.
▪ On-call covers AI systems with documented escalation to model owner within 5 minutes.

Level 4

Optimized

▪ Model lifecycle docs (intake → retire) version-controlled and audited.
▪ Incident game-days simulate AI-specific failures; learnings feed back to runbook.
▪ Automated rollback for any deploy where post-deploy eval drops below threshold.
▪ Documentation reviewed by an independent reader who has never seen the system.

Framework crosswalk

NIST AI RMF

GOVERN 1 (Policies & procedures)
MANAGE 1 (Risk treatment)
MANAGE 4 (Communication & response)

ISO/IEC 42001

Clause 7 — Support
Clause 8 — Operation
Clause 9 — Performance evaluation

SR 11-7

Section VI — Documentation & validation
Effective challenge

EU AI Act

Article 9 — Risk management system
Article 17 — Quality management system

Pillar IV

Agent Governance

The question

"For organizations running agentic systems: are agents named, owned, supervised, and bounded — like roles on an org chart?"

Why it matters

Multi-agent systems blur the line between code and personnel. A poorly-governed agent is a coworker with no manager, no review cycle, and no one to escalate to when it does something stupid. The questions a human-resources department would ask of a new hire — what does it do, who supervises it, what is it allowed to do — are exactly the questions to ask of every agent.

Level 1

Ad hoc

▪ Agents exist; no published org chart.
▪ No named owner per agent.
▪ No defined escalation path when an agent fails.
▪ Tool-permission policy implicit.

Level 2

Defined

▪ Agent inventory in a doc; ownership unclear or shared by committee.
▪ Escalation defined for catastrophic failures; not for soft failures.
▪ Tool permissions configured per agent but not reviewed.
▪ Eval cadence ad hoc.

Level 3

Managed

▪ Each agent has a named human owner with documented scope.
▪ Tool permissions follow least-privilege per agent; reviewed quarterly.
▪ Eval cadence: at least monthly, against held-out scenarios.
▪ Agent-to-agent handoff governance documented.

Level 4

Optimized

▪ Agents have role descriptions, "performance reviews," and retirement criteria.
▪ Permission policy uses always-ask gates for destructive actions.
▪ Cross-agent transcripts retained, indexed, and audited.
▪ Independent challenger evaluates agent behavior outside the build team.

Framework crosswalk

NIST AI 600-1

Excessive Agency / Autonomy risks (Generative AI Profile)

OWASP LLM Top 10

LLM06 Excessive Agency
LLM10 Unbounded Consumption

NIST AI RMF

GOVERN 3 (Workforce & accountability)
MEASURE 4 (Communication & feedback)

ISO/IEC 42001

Annex A — AI lifecycle management

Primary sources behind the rubric

NIST AI 100-1 — Artificial Intelligence Risk Management Framework (AI RMF 1.0), January 26, 2023.
NIST AI 600-1 — Generative AI Profile, July 26, 2024.
ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system. December 2023.
EU AI Act — Regulation (EU) 2024/1689, in force August 1, 2024; full applicability August 2, 2026.
Federal Reserve / OCC SR 11-7 — Supervisory Guidance on Model Risk Management, April 4, 2011.
OWASP Top 10 for LLM Applications — 2025 edition, OWASP Gen AI Security Project.
MITRE ATLAS — Adversarial Threat Landscape for AI Systems, version 5.1, November 2025.

See the rubric applied

Six worked examples — from a customer-support LLM through a mature production pipeline — show how the rubric scores real deployment shapes.

Read the examples Run the free scan →