AI Solutions Quality Index (ASQI): a Solid Basis for Procurement and Deployment Decisions

Thinkpiece contributed by Sebastian Hallensleben, Chief Trust Officer at Resaro and Chair of CEN-CENELEC JTC21. This contribution forms part of the Temasek X Resaro AI Assurance Forum Report (2025). Read more here.

Imagine you have decided to buy a car. You go to various car dealerships to look at your options, compare models and collect enough information to make a choice that will serve your needs. But imagine your frustration if every conversation with a salesperson goes like this:

 

You: How fast is this car?

Dealer: It is really a great car. In fact, it is probably the fastest. All our customers are happy with how fast it is.

You: Can you give me a few numbers?

Dealer: Trust me, the car gets you to your destination in no time at all. It is that awesome!

You: Hmm - I also care about safety. What does the car offer in terms of safety?

Dealer: It is a fantastic car. It is very very safe. It fulfils all the regulations. I can show you the certifications to prove it meets all the regulations. So it is completely safe.

You: What about the cost of running the car? How much gas does it need? What are the insurance rates?

Dealer: It is really cheap to run. You can trust me that it is very very cheap. I can’t give you any figures but you will be amazed how cheap it is.

 

Seems farfetched? For buying a car it certainly is farfetched. We fully expect to make our choice based on clear, understandable indicators of quality such as fuel consumption, top speed, acceleration, noise level, space for passengers, space for luggage, NCAP safety rating, stopping distance, theft protection, charging speed etc. Note that this is a mix of performance, safety, security and convenience indicators.

We also fully expect that these indicators of quality can be, and have been, independently measured – be it by ourselves, by regulators, or by product testing organisations. We also fully expect that these indicators of quality don’t just re-iterate compliance with regulations (after all, we assume there won’t be an illegal car on offer at the dealership), but rather measure characteristics that go beyond the scope of what is addressed by regulation.

Contrast this with procuring an AI solution – let’s say, a customer service chatbot handling insurance claims, or an automated landing software for a professional drone. In these cases, metrics are much harder to come by. Where they exist, they usually refer to low-level technical capabilities of the actual AI model rather than the characteristics of the whole system or solution. In many cases, they do not exist at all. Procurement departments are faced with weighing little more than marketing prose when comparing multiple solutions. Even before, they are struggling to write clear quality measures into requests for quotation because there are no widely agreed ways of measuring quality for AI solutions.

There is a closely related challenge: Suppose you are in the lucky position of having an in-house AI engineering team. When should this team stop testing and refining an AI system? When is the system “good enough” to be deployed? In fact, what does “good” even mean, and how can you measure it?

Let’s consider how to construct a useful set of quality indicators that tells us when an AI system or solution is good enough and/or which of two AI solutions is preferable. Let’s call a set of quality indicators a “quality index”. The following principles need to guide the construction of such quality indices:

1. Specific to the use case

Describing the quality of a customer service chatbot is different from describing the quality of a drone landing system and again different from describing the quality of a deepfake detection solution. A quality index needs to be specific to a class of use cases with just the right breadth. It helps to have a certain degree of flexibility in the quality index so that the same index can cover a wider range of use cases. 

2. A shared language

A quality index needs to be meaningful to a diverse set of stakeholders, including those from business functions, governance functions and technical functions in an organisation. It is essentially the shared language for quality across roles and needs to be carefully and precisely wordcrafted with conscious choice of terminology.

3. Non-binary

Quality is not a yes/no characteristic. At the very least, a handful of quality levels need to be clearly distinguishable for each indicator, ranging from the best that is (or soon will be) achievable to a very low level that stakeholders would select if they don’t really worry about this indicator.

4. Mapped to automatable technical tests

To have any validity, a quality index needs to be supported by technical tests. These tests should be automatable, and their results translated back to the “shared language” (cf. point above).


5. The right level of detail

A quality index that consists of just a single number is hardly helpful because it would lump together too many different aspects of quality. On the other hand, a hundred individual indicators are more confusing than useful. Constructing an index with about two dozen indicators spanning key aspects of performance and risk handling is the most sensible middle ground.

6. Compatible with established AI governance frameworks

A quality index should not be a standalone construct but needs to be connected to established regulations and standards such as the EU AI Act (plus harmonised standards), ISO/IEC 42001, AI Verify or company-internal responsible AI standards. Many indicators of quality will help to support compliance with such established governance frameworks, and this synergy should not be ignored.

7. Compatible with standardised task catalogues

While a quality index is designed to be at the system or solution level, it will inevitably refer to the 2-3 core tasks that a solution is designed to address. Such references should be to broadly recognised task catalogues, e.g. the catalogue currently in preparation in the AIQI consortium.

 

Without use-case-specific quality indices as outlined above it would remain hard to impossible to compare different AI solutions, or to decide when AI is “good enough” to go live. Moreover, the market for AI solutions would remain intransparent which in turn would impede rapid feedback as drivers for innovation. 

As a first step, and in parallel to conducting the AI Assurance Forum, Resaro has therefore started the creation of an “AI Solutions Quality Index (ASQI)” as a series of quality indices for use cases including customer service chatbots. 

Industry is invited to pilot and co-create this approach and thus gain confidence in deploying the right AI solution with well-understood quality characteristics.