Auditing AI

Step 4Auditing AI

How AI systems work in enterprise settings

Effective AI governance audit starts with a working model of how these systems are built, trained, deployed, and updated. The audit findings that matter most are rarely about the model -- they are about the decisions surrounding it.

The enterprise AI stack

Most enterprise AI deployments are not custom-built models. They are vendor-supplied systems, cloud-hosted APIs, or embedded tools within existing platforms -- ERP add-ons, scheduling optimizers, pricing engines, demand forecasting modules. The audit scope is usually a business decision layer wrapped around a model the organization did not build and may not fully understand.

The relevant components for audit purposes:

Training data: The historical records used to develop the model's predictive behavior. Quality, representativeness, and documentation of training data are core governance questions.
Model: The algorithm or neural network architecture. For most enterprise deployments this is vendor-controlled. Audit's access is typically limited to outputs, not model internals.
Feature engineering: How raw data is transformed into the inputs the model uses. This is often where bias and instability enter -- and where audit can usually get traction.
Deployment infrastructure: How the model is served to users or systems. Includes version control, rollback capability, access controls, and environment separation.
Monitoring layer: What the organization has in place to detect model drift, output anomalies, or performance degradation over time.
Human decision interface: How model outputs reach human decision-makers and what override or escalation mechanisms exist.

Model types relevant to electrical distribution

Electrical distribution enterprises encounter AI in several recurring forms:

Demand forecasting: Predicting inventory requirements and delivery timing. Bias or calibration errors affect purchasing decisions at scale.
Pricing optimization: Dynamic or contract pricing recommendations. Audit implications include fair pricing consistency and margin governance.
Credit decisioning: Trade credit extension, credit limit management. Regulatory fair lending considerations may apply depending on customer type.
Workflow automation: Approval routing, exception flagging, invoice matching. These systems often make consequential decisions with minimal human review.
Predictive maintenance: Failure prediction for fleet or facility assets. Relevant where equipment failure has safety or continuity implications.

The lifecycle audit gap

The most common governance failure is not deploying a bad model -- it is failing to govern the model after deployment. Models drift as operating conditions change. Training data becomes unrepresentative. Business rules evolve but model inputs do not. Audit programs that cover initial deployment but not ongoing performance leave the highest-risk period unexamined.

A credible AI audit program has coverage at three lifecycle stages: pre-deployment validation, post-deployment monitoring, and periodic revalidation.

Key concept

Most enterprise AI systems are black boxes from the audit perspective -- the model internals are vendor-controlled. The audit scope is the governance and decision layer around the model, not the model itself. That is where control deficiencies actually live.

Lifecycle stages

Pre-deployment validation
Post-deployment monitoring
Periodic revalidation

The AI governance landscape

Regulatory direction on AI governance is converging around a set of common principles even as jurisdiction-specific requirements diverge. Audit needs a working model of both layers.

Regulatory frameworks in force or development

The governance landscape includes both binding requirements and voluntary frameworks that are increasingly referenced in regulatory examinations and third-party assessments:

EU AI Act (2024): The most comprehensive binding AI regulation currently in force. Risk-tiered requirements; high-risk systems require conformity assessments, transparency documentation, human oversight, and technical documentation. Relevant for organizations with EU exposure.
NIST AI RMF (2023): Voluntary but widely adopted. Provides a risk-based framework organized around four functions: Govern, Map, Measure, Manage. Increasingly referenced by U.S. federal agencies and used as an audit framework basis.
SEC AI-related disclosures: Guidance on disclosure obligations when AI-related risks are material. Relevant for public companies; enforcement is active on AI-related disclosure gaps.
FTC guidance: Focus on automated decision-making fairness and deceptive practices involving AI. Relevant for pricing and credit systems.
IIA Standards (2024): The 2024 IIA Standards introduce explicit expectations for audit to understand and assess technology-enabled processes, including AI. Methodology standards require auditors to apply appropriate techniques for technology-dependent environments.

The convergence principles

Across frameworks, the governance requirements that appear consistently are:

Documented purpose and scope for each system
Risk classification or tiering at deployment
Training data documentation and quality standards
Human oversight mechanisms with defined escalation paths
Performance monitoring and drift detection
Incident response procedures for model failures
Bias and fairness assessment where consequential decisions are involved
Vendor management requirements for third-party AI systems

An AI governance program that addresses these eight requirements will be substantially aligned with all major current frameworks, regardless of which specific regulation applies.

Governance gap pattern

Most organizations have AI systems deployed before governance frameworks were established. The common audit finding is not "no governance" -- it is governance that exists on paper but was not implemented alongside the system being governed.

Terminology and glossary

Shared vocabulary is the starting point for productive AI audit conversations with technical stakeholders and management. These definitions reflect common enterprise usage, not academic precision.

Algorithm

A defined set of rules or procedures that a system follows to produce an output. Not all algorithms are AI -- rule-based systems (if/then logic) are algorithms but not machine learning. The distinction matters for audit scope.

Machine learning (ML)

A category of AI where systems learn patterns from data rather than following explicitly programmed rules. The model's behavior is shaped by training data, which is why data quality governance is foundational.

Large language model (LLM)

A type of ML model trained on large volumes of text. Powers tools like ChatGPT, Copilot, and embedded AI assistants. Audit-relevant concerns include hallucination, data leakage in user prompts, and output reliability for consequential tasks.

Model drift

Degradation in model accuracy over time as real-world conditions diverge from training data. A model trained on pre-pandemic distribution patterns, for example, may produce increasingly unreliable outputs as supply chain conditions change. Monitoring for drift is a core control.

Hallucination

When an LLM generates plausible-sounding but factually incorrect output. The term is specific to generative AI. Relevant audit risk: LLM outputs used in reports, customer communications, or decision support without human verification.

Training data

The historical dataset used to develop a model. Quality, completeness, representativeness, and governance of training data determine much of what the model can and cannot do reliably. Audit should assess whether training data was curated, documented, and reviewed for bias.

Explainability

The degree to which a model's output can be explained in terms a human can evaluate. High explainability means a decision-maker can understand why the model produced a result. Low explainability is a risk factor where consequential decisions are involved.

Human-in-the-loop

A design pattern where humans review or approve AI outputs before they take effect. The critical audit question is not whether a human is "in the loop" in name but whether the human intervention is substantive -- whether the reviewer has the time, information, and authority to actually override the model.

Bias (algorithmic)

Systematic errors in model outputs that disadvantage specific groups or produce consistently skewed results. Originates in training data, feature selection, or model design. Most relevant for credit, pricing, and employee-related AI systems.

Shadow AI

AI tools adopted by employees without IT or compliance review -- typically consumer AI products used for work tasks. A significant and often invisible governance gap. Audit should assess whether the organization has a policy, detection mechanisms, and enforcement.

Practitioner note

When interviewing technical stakeholders, surface which terms they use for their systems. "AI" in management vocabulary often means something more specific in the engineering team's language. Closing that gap in your planning phase prevents scope misalignment in fieldwork.

Policy and governance framework requirements

AI governance is most commonly deficient not because organizations lack policies but because the policies exist without the infrastructure to enforce them. Audit should assess both layers.

Policy elements that should exist

A complete AI governance policy structure includes:

AI use policy: Defines permitted and prohibited uses of AI, including employee-use guidelines for external AI tools. Should distinguish between internal development, vendor-supplied systems, and consumer AI.
Risk classification framework: A tiered structure for categorizing AI systems by consequence severity. At minimum: low-risk (informational), medium-risk (advisory), high-risk (consequential or regulated).
AI inventory: A documented register of all AI systems in production, including vendor-supplied systems embedded in other platforms. The most common governance gap is an incomplete inventory -- systems the governance team does not know exist.
Third-party AI policy: Requirements for vendor AI systems, including contractual documentation, performance standards, and audit rights.
Acceptable use by function: Function-specific guidance for high-risk business areas such as credit, pricing, HR, and customer-facing communications.

Governance structure

Effective AI governance requires a defined accountability structure. Audit should assess:

Is there a designated AI governance owner (CISO, CTO, CDO, or dedicated role)?
Is there a cross-functional AI review or approval body for new system deployments?
Are business process owners accountable for the AI systems they use, even when those systems are vendor-supplied?
Does the governance structure have teeth -- approval authority that can actually block deployment?

Common audit findings in this area

AI inventory exists but is materially incomplete -- excludes embedded vendor AI in ERP or CRM platforms
Risk classification framework exists but new systems are not routinely classified before deployment
Employee AI use policy prohibits unapproved tools but no detection or enforcement mechanism exists
Third-party vendor contracts predate AI governance requirements and do not include AI-specific provisions
Governance owner is defined but lacks the cross-functional authority to enforce policy

Evidence to request

AI inventory / system register
Risk classification decisions
Governance committee meeting records
Vendor AI contract provisions
Employee acknowledgment records

The AI control environment

AI systems require a distinct set of controls overlaid on -- not replacing -- existing IT general controls. Understanding where AI controls layer onto traditional ITGC is essential for scoping engagements and avoiding duplication of effort.

Controls that traditional ITGC frameworks cover

Standard ITGC testing covers access controls, change management, operations controls, and availability. These apply to AI systems the same way they apply to any software system. Audit should confirm ITGC coverage extends to AI system infrastructure -- but should not re-perform ITGC testing under an AI audit label.

Controls that are AI-specific

Data lineage and training data controls: Documentation of where training data originated, how it was cleaned and labeled, and whether it was reviewed for quality and representativeness before use. Most organizations lack adequate documentation here.
Model validation: Independent validation of model performance before deployment. In higher-risk applications (credit, fraud detection, pricing at scale), validation should be performed by someone other than the model developer.
Performance monitoring: Ongoing tracking of model accuracy metrics, drift indicators, and output distributions. Monitoring without defined alert thresholds is surveillance, not control.
Human override controls: Defined process for when and how humans override model outputs. Audit should verify overrides are logged, reviewed, and analyzed as a signal of model performance.
Explainability requirements: For consequential decisions (credit denials, pricing exceptions, employee flags), the control is whether an explanation can be produced. Testing: request an explanation for a recent adverse output and assess whether it is credible and complete.
Retraining controls: Triggers, frequency, and approval process for model retraining. Ad hoc retraining without approval is a change management control failure.

Vendor AI controls

For vendor-supplied AI systems, audit should assess the organization's reliance management controls:

Does the organization understand what the vendor's system is doing?
Does the contract include performance standards and incident notification requirements?
Does the organization have independent means to detect when the vendor system is performing poorly?
What is the exit strategy if the vendor relationship ends?

Scoping principle

AI audit scope should be layered, not duplicated. If ITGC testing is in scope separately, the AI audit covers the AI-specific controls -- data, validation, monitoring, human oversight, explainability, retraining. Merging the two produces a scope that is too broad to complete effectively.

AI as an audit tool

Internal audit functions are beginning to use AI tools in their own work -- for document analysis, risk identification, anomaly detection, and reporting. This use raises governance questions that are distinct from auditing AI systems.

Current practical applications

The AI tools audit functions are actually using in practice, ranked approximately by adoption:

LLM-assisted document review: Summarizing policies, contracts, and large document sets. Reduces time spent on initial review. Key risk: hallucination in output summaries used as evidence without verification.
Anomaly detection in transaction data: Pattern identification across large populations. Most effective where audit already has structured data access and defined control objectives.
Risk assessment assistance: Using LLMs to generate initial risk factor lists, identify precedents, or draft risk narratives. Effective as a starting point; requires practitioner review and editing.
Workpaper drafting: Generating initial workpaper structures, procedure templates, and observation narratives from field notes. Reduces documentation time. Same hallucination risk applies.
Issue and recommendation benchmarking: Searching precedent findings across prior workpapers or industry databases. Useful for consistency review.

Governance requirements for audit's AI use

Audit functions using AI tools in their own work need the same governance structure they audit in others. At minimum:

Approved tool list: which AI tools are authorized for use in audit work
Data handling policy: what audit data can be entered into external AI systems (most organizations should prohibit entering confidential audit evidence into consumer LLMs)
Verification requirement: all AI-generated content used in workpapers or reports must be verified by the practitioner producing it
Disclosure standard: when and how AI assistance is noted in audit documentation

What AI does not replace

AI tools reduce time on information processing tasks. They do not replace:

Professional judgment about risk significance and finding severity
Relationship-based intelligence about organizational context
The credibility that comes from practitioner expertise in presenting findings to management
Accountability for the accuracy of audit conclusions

Audit leadership should be explicit about this boundary to prevent over-reliance on AI outputs as a substitute for practitioner engagement.

Data handling rule

The default rule for most organizations should be: do not enter confidential audit evidence, personnel data, or client information into consumer-facing AI products. Verify your organization has a written position on this before using any external AI tool with audit data.

Learning and staying current

AI governance is a fast-moving field. Practitioner competency requires both structured learning and an ongoing practice of staying informed. These resources are platform-independent -- the same concepts apply regardless of which tools or vendors your organization uses.

Structured learning and certifications

Several professional bodies have introduced AI-focused credentials and learning programs relevant to internal audit practitioners:

IIA AI Auditing Certificate: Developed specifically for internal auditors. Covers AI governance, audit program design, evidence standards, and reporting. The most audit-specific structured credential currently available.
ISACA CDPSE / CRISC: Broader data privacy and risk credentials that include AI governance components. Relevant for practitioners whose scope includes IT governance and technology risk.
NIST AI RMF training: NIST publishes free training materials and workshops on applying the AI Risk Management Framework. Practical and directly applicable to governance audit work.
MIT / Stanford executive programs: Short-format programs on AI governance, AI ethics, and responsible AI. More relevant for audit leaders developing program strategy than for fieldwork practitioners.
Vendor-sponsored training: Many enterprise AI vendors offer training on their specific systems. Useful for product knowledge; less useful for governance framework literacy.

These credentials vary significantly in content, rigor, and recognition. Evaluate based on what gap you are trying to close -- technical understanding, governance framework literacy, or fieldwork methodology. No single credential covers all three.

Staying informed

AI governance is evolving faster than any credential can track. A sustainable practice for staying current:

Primary sources: Read NIST, SEC, FTC, and IIA publications directly. Regulatory guidance and professional standards are the authoritative layer; commentary and interpretation come after.
Enforcement actions: SEC and FTC AI-related enforcement actions are publicly available and often more instructive than guidance documents about what actually triggers examination scrutiny.
Practitioner communities: IIA chapter AI working groups, ISACA SIGs, and similar communities provide practitioner-level intelligence about what audit functions are encountering in fieldwork.
Academic research: SSRN, MIT CSAIL, and Stanford HAI publish relevant research. Most useful for understanding where the field is heading rather than what to do today.
AI incident databases: The AI Incident Database (incidentdatabase.ai) and similar repositories document real-world AI failures. Reviewing recent entries is one of the fastest ways to develop practical risk intuition.

Learning Module

The AI Readiness Learning Module on this platform structures the concepts on this page into a guided learning path with knowledge checks. Practitioner path includes a skills assessment across five dimensions.

Start the module →