Interpretability

Interpretability in AI refers to the degree to which a human can understand the internal mechanics or decision logic of a model. It is a key concept in AI assurance, particularly in situations where transparency, auditability, or regulatory compliance are required.

Unlike explainability, which focuses on producing post hoc rationales for decisions, interpretability emphasises the inherent comprehensibility of the model itself. For example, a linear regression or decision tree is generally considered interpretable, whereas a deep neural network is not.

Interpretability is essential when:

Decisions must be reviewed, contested, or explained to non-expert stakeholders
Models are used in high-stakes settings such as criminal sentencing, military targeting, or health diagnostics
Legal frameworks require reasoning to be documented (e.g., under GDPR’s right to explanation)

In AI assurance, interpretability is assessed by:

Reviewing the model architecture and complexity
Evaluating whether key features and decision paths can be understood
Testing the consistency and stability of model behaviour under different inputs
Confirming that humans can validate or contest decisions made by the model

Interpretability supports:

Transparency, by making system logic accessible
Accountability, by enabling tracing of responsibility
Debugging and validation, by allowing developers to identify flaws

Assurance teams may select inherently interpretable models or use interpretability tools (e.g., feature importance rankings, partial dependence plots) to enhance understanding. Trade-offs between performance and interpretability are often necessary, especially in time-critical or risk-sensitive domains.

Interpretability is particularly important in safety and security applications, where operators must trust — and if necessary, override — system decisions. In such cases, interpretability supports human judgement and aligns with ethical and legal expectations.

Ultimately, interpretability contributes to a broader assurance goal: ensuring that AI systems behave in ways that are not only correct but also understandable, predictable, and aligned with human reasoning.