Mastascusa Holdings · Audit methodology
The rubric, in full
Every score the audit produces is anchored to a published criterion and crosswalked to a recognized framework control. Nothing about the rubric is proprietary — the audit's value is in evidence-backed application of it, not in keeping it secret. Read it before commissioning. Disagree with a score on its merits, not its opacity.
Pillar I
Data Architecture
The question
"How does information flow from source to model — and can the team reproduce a training run on demand?"
Why it matters
Silent data drift is the most common production failure mode. If the inputs your model sees in production no longer match the inputs it was trained on, predictions degrade quietly. Models also fail when training data and serving data are computed differently — training-serving skew is invisible until it produces a bad outcome.
Level 1
Ad hoc
- ▪ Training data assembly is undocumented or recreated from memory.
- ▪ No baseline distribution captured at deployment time.
- ▪ Drift detection is non-existent or anecdotal.
- ▪ Feature pipelines have no tests.
Level 2
Defined
- ▪ Lineage exists in slides or wikis, but is stale or partial.
- ▪ Quality gates documented; not consistently enforced.
- ▪ Drift monitoring discussed in runbooks but rarely acted on.
- ▪ Some feature pipeline tests; not a CI gate.
Level 3
Managed
- ▪ End-to-end lineage maintained, refreshed at known cadence (e.g. quarterly).
- ▪ Quality gates enforce schema, range, and null-rate at ingestion.
- ▪ Drift monitored against baselines with named owner and an SLA for response.
- ▪ Training-serving consistency tested in CI.
Level 4
Optimized
- ▪ Lineage automated; reproducibility verified by CI on a representative sample.
- ▪ Drift detection statistical (KS / PSI / ADWIN) with auto-trigger on retraining or rollback.
- ▪ Training-serving identity contractually enforced; deltas alert.
- ▪ Feature catalog with documented owners, schemas, and versions.
Framework crosswalk
NIST AI RMF
- MAP 2 (Categorization & context)
- MEASURE 2 (Performance & robustness)
- MANAGE 2 (Treatment & monitoring)
ISO/IEC 42001
- Annex A — Data management
- Annex A — Process for data resources
EU AI Act
- Article 10 — Data and data governance (high-risk)
SR 11-7
- Section IV — Model development & implementation
- Section V — Ongoing monitoring
Pillar II
Access Control
The question
"Who can touch what — and does the implemented reality match the stated policy across the AI surface?"
Why it matters
For AI systems "access" means more surfaces than for traditional apps: training data, model weights, fine-tunes, inference endpoints, prompts, and the supply chain that produced any of those. Generic SaaS RBAC does not cover this. Most organizations inherit weak controls and discover the gap during incident response.
Level 1
Ad hoc
- ▪ AI access controls inherit a generic SaaS RBAC; no model-specific roles.
- ▪ Model weights / fine-tunes / training data sit on shared storage with broad read.
- ▪ No audit trail tying who pulled which model version when.
- ▪ Prompt-injection / supply-chain risk not enumerated.
Level 2
Defined
- ▪ Model-specific roles defined on paper; provisioning still ad hoc.
- ▪ Inference endpoints behind auth, but rate-limiting and key rotation incomplete.
- ▪ Some audit logging on model registry; not reviewed.
- ▪ OWASP LLM Top 10 mentioned in a doc; not used as a checklist.
Level 3
Managed
- ▪ Model-specific roles enforced via IAM; principle of least privilege documented.
- ▪ Inference endpoints have authn + per-tenant rate limits + key rotation.
- ▪ Audit log of model pulls + deploys reviewed monthly.
- ▪ OWASP LLM Top 10 used as deploy checklist; prompt-injection tests in eval suite.
Level 4
Optimized
- ▪ JIT access for sensitive ops (e.g. fine-tune writes); break-glass logged + reviewed.
- ▪ Supply-chain provenance verified (model card hash, training data SHA, base-model attestation).
- ▪ Adversarial-ML threat model drawn from MITRE ATLAS; controls mapped tactic-by-tactic.
- ▪ Quarterly red-team or external penetration test against the AI surface.
Framework crosswalk
NIST AI RMF
- GOVERN 1.5 (Roles & responsibilities)
- MAP 5 (Impact assessment)
OWASP LLM Top 10
- LLM01 Prompt Injection
- LLM02 Sensitive Information Disclosure
- LLM03 Supply Chain
- LLM06 Excessive Agency
MITRE ATLAS
- Tactic: ML Model Access
- Tactic: Initial Access
- Tactic: Exfiltration
EU AI Act
- Article 15 — Accuracy, robustness, cybersecurity (high-risk)
Pillar III
Process Documentation
The question
"Can the organization actually operate what it has built — under load, on-call, and during an incident?"
Why it matters
A model that works in a notebook is not a system that works under production load. Failure happens when the runbook does not exist, the kill-switch is not rehearsed, or the on-call person does not know which model is talking to which downstream service. The Air Canada chatbot ruling and Knight Capital both fit this shape: the breakage was operational, not algorithmic.
Level 1
Ad hoc
- ▪ No model inventory; nobody can list every deployed model in under an hour.
- ▪ No incident runbook for AI-specific failure modes (hallucination, drift, agent runaway).
- ▪ No documented kill switch or rollback procedure.
- ▪ On-call rotation does not include AI systems.
Level 2
Defined
- ▪ Model inventory in a spreadsheet; partially current.
- ▪ Generic incident runbook; AI-specific footnotes.
- ▪ Kill switch theoretically exists; never rehearsed.
- ▪ On-call has weak coverage; escalation paths unclear.
Level 3
Managed
- ▪ Model inventory live, with named owner, version, and last-deploy date.
- ▪ AI-specific runbook covering at least: drift, eval-loop failure, prompt-injection, agent runaway.
- ▪ Kill switch tested at least once per quarter.
- ▪ On-call covers AI systems with documented escalation to model owner within 5 minutes.
Level 4
Optimized
- ▪ Model lifecycle docs (intake → retire) version-controlled and audited.
- ▪ Incident game-days simulate AI-specific failures; learnings feed back to runbook.
- ▪ Automated rollback for any deploy where post-deploy eval drops below threshold.
- ▪ Documentation reviewed by an independent reader who has never seen the system.
Framework crosswalk
NIST AI RMF
- GOVERN 1 (Policies & procedures)
- MANAGE 1 (Risk treatment)
- MANAGE 4 (Communication & response)
ISO/IEC 42001
- Clause 7 — Support
- Clause 8 — Operation
- Clause 9 — Performance evaluation
SR 11-7
- Section VI — Documentation & validation
- Effective challenge
EU AI Act
- Article 9 — Risk management system
- Article 17 — Quality management system
Pillar IV
Agent Governance
The question
"For organizations running agentic systems: are agents named, owned, supervised, and bounded — like roles on an org chart?"
Why it matters
Multi-agent systems blur the line between code and personnel. A poorly-governed agent is a coworker with no manager, no review cycle, and no one to escalate to when it does something stupid. The questions a human-resources department would ask of a new hire — what does it do, who supervises it, what is it allowed to do — are exactly the questions to ask of every agent.
Level 1
Ad hoc
- ▪ Agents exist; no published org chart.
- ▪ No named owner per agent.
- ▪ No defined escalation path when an agent fails.
- ▪ Tool-permission policy implicit.
Level 2
Defined
- ▪ Agent inventory in a doc; ownership unclear or shared by committee.
- ▪ Escalation defined for catastrophic failures; not for soft failures.
- ▪ Tool permissions configured per agent but not reviewed.
- ▪ Eval cadence ad hoc.
Level 3
Managed
- ▪ Each agent has a named human owner with documented scope.
- ▪ Tool permissions follow least-privilege per agent; reviewed quarterly.
- ▪ Eval cadence: at least monthly, against held-out scenarios.
- ▪ Agent-to-agent handoff governance documented.
Level 4
Optimized
- ▪ Agents have role descriptions, "performance reviews," and retirement criteria.
- ▪ Permission policy uses always-ask gates for destructive actions.
- ▪ Cross-agent transcripts retained, indexed, and audited.
- ▪ Independent challenger evaluates agent behavior outside the build team.
Framework crosswalk
NIST AI 600-1
- Excessive Agency / Autonomy risks (Generative AI Profile)
OWASP LLM Top 10
- LLM06 Excessive Agency
- LLM10 Unbounded Consumption
NIST AI RMF
- GOVERN 3 (Workforce & accountability)
- MEASURE 4 (Communication & feedback)
ISO/IEC 42001
- Annex A — AI lifecycle management
Primary sources behind the rubric
- NIST AI 100-1 — Artificial Intelligence Risk Management Framework (AI RMF 1.0), January 26, 2023.
- NIST AI 600-1 — Generative AI Profile, July 26, 2024.
- ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system. December 2023.
- EU AI Act — Regulation (EU) 2024/1689, in force August 1, 2024; full applicability August 2, 2026.
- Federal Reserve / OCC SR 11-7 — Supervisory Guidance on Model Risk Management, April 4, 2011.
- OWASP Top 10 for LLM Applications — 2025 edition, OWASP Gen AI Security Project.
- MITRE ATLAS — Adversarial Threat Landscape for AI Systems, version 5.1, November 2025.
See the rubric applied
Six worked examples — from a customer-support LLM through a mature production pipeline — show how the rubric scores real deployment shapes.