Reproducibility

Reproducibility in AI refers to the ability to consistently recreate a system’s behaviour or outputs under the same conditions—using the same code, data, configurations, and parameters. It is a scientific and operational cornerstone of AI assurance, supporting transparency, auditability, and trust.

A reproducible AI system allows:

Internal teams to verify results across environments or teams
External auditors to confirm system claims during evaluations
Regulators to validate compliance and investigate failures
Researchers and partners to replicate experiments or deployments

Reproducibility challenges often stem from:

Lack of documentation or version control
Randomness in model initialisation or data sampling
Environmental dependencies (e.g., hardware, libraries, or APIs)
Changes to data pipelines, preprocessing, or model updates

Assurance practices promote reproducibility by enforcing:

Code and data versioning with tools like Git or DVC
Use of containers or environment specifications (e.g., Docker, Conda)
Logging of hyperparameters, seeds, and training configurations
Standardised workflows and reproducible evaluation metrics

In mission-critical systems, reproducibility ensures that performance claims can be backed up by evidence and revalidated if the system fails or is challenged. It also supports post-incident analysis, regulatory reporting, and continuous improvement.

Regulatory frameworks and technical standards (e.g., ISO/IEC, NIST) increasingly recognise reproducibility as a criterion for trustworthy AI. Public sector and defence applications often require that deployed systems can be independently validated before use.