**Peer Review Journal ** DOI on demand of Author (Charges Apply) ** Fast Review and Publicaton Process ** Free E-Certificate to Each Author

Current Issues
     2026:7/3

International Journal of Multidisciplinary Research and Growth Evaluation

ISSN: (Print) | 2582-7138 (Online) | Impact Factor: 9.54 | Open Access

Auditing Data Governance for AI/ML in Financial Institutions: Verifying the Integrity, Traceability, and Lineage of Training and Production Data under Regulatory Mandates

Full Text (PDF)

Open Access - Free to Download

Download Full Article (PDF)

Abstract

The increasing integration of artificial intelligence (AI) and machine learning (ML) into financial institutions has shifted the primary locus of model risk from algorithmic implementation toward the quality, provenance, and governance of data. While traditional model risk management frameworks emphasize conceptual soundness, performance validation, and outcome monitoring, they often treat data governance as a supporting operational function rather than as a core object of independent assurance. This creates a structural gap: institutions may possess formal data governance policies and advanced data architectures, yet lack verifiable mechanisms to demonstrate to auditors and regulators that training and production data are complete, traceable, reproducible, and appropriately controlled throughout the model lifecycle.
This paper proposes a structured, audit-oriented framework for data governance in AI/ML systems within financial institutions. The central conceptual shift is to treat datasets not merely as operational inputs, but as controlled model artifacts subject to explicit versioning, immutability, provenance tracking, and independent verification. By reframing data governance as an assurance and verification problem rather than solely a management or architectural problem, the framework translates high-level regulatory expectations into concrete control objectives, audit tests, and evidence standards.
The paper introduces the Transparent Extract Transform Load (T-ETL) architecture, an extension of conventional data pipelines that embeds lineage capture, policy enforcement, and integrity verification directly into data ingestion, transformation, and deployment processes. T-ETL integrates graph-based lineage representations, cryptographic hash commitments, and bi-temporal data reconstruction to support reproducibility and forensic auditability. This architecture enables auditors and supervisors to reconstruct the precise data state, transformation logic, and governance controls in effect at the time of any model decision.
Mathematically, the framework formalizes data pipelines as transformation functions over versioned datasets, models lineage as directed acyclic graphs over data artifacts, and defines integrity, drift, and bias as computable properties subject to continuous monitoring. These formalizations support the transition from qualitative governance assertions to quantitative, testable assurance mechanisms.
By mapping these technical controls to regulatory mandates including SR 11-7, BCBS 239, the EU Artificial Intelligence Act, and the Digital Operational Resilience Act (DORA), the paper provides a coherent audit framework that aligns operational data practices with supervisory expectations. The contribution is not the introduction of new regulatory principles, but the operationalization of existing ones into a unified, model-centric assurance structure. This approach supports regulatory defensibility, enhances institutional accountability, and strengthens trust in AI-driven financial decision-making.

How to Cite This Article

Puneet Redu (2026). Auditing Data Governance for AI/ML in Financial Institutions: Verifying the Integrity, Traceability, and Lineage of Training and Production Data under Regulatory Mandates . International Journal of Multidisciplinary Research and Growth Evaluation (IJMRGE), 7(3), 42-56. DOI: https://doi.org/10.54660/.IJMRGE.2026.7.3.42-56

Share This Article: