The methods of machine learning have conquered everyday life. AI outperforms humans in board games, it can drive cars, predict complicated protein folding structures, and can even translate entire books. This may all be of great benefit to us, but it isn’t the full story. Algorithms might also ‘decide’ whether I can secure a bank loan or not or whether I will be invited to a job interview. And they are employed in numerous other situations that involve value judgments. Such situations not only raise technical issues – which algorithm works better? – but also point to societal questions regarding their use. Which algorithm do we want to use in what circumstances and for what purpose?
Indeed, the baffling efficiency of machine learning raises a number of moral and philosophical issues as well. What do algorithmic predictions actually mean, and what are the reasons for these predictions? Are they fair, unbiased, and close to the truth? When my bank manager tells me: “You can’t have a loan”, I feel justified in asking: “Why not?” I expect the bank to justify its decision and to explain how it reached its conclusion so that I can improve my application or try my luck somewhere else in the future. Can we expect a similar response from algorithms? Should they have to explain themselves to us?
These kinds of demands feed into the new area of research known as “explainable AI”, an attempt to make the decisions of complicated algorithms more comprehensible for humans by also having them generate explanations. What can we really expect from this demand for explanations? And can explainable AI meet our expectations?
In the European Union, the question has become even more pressing after the European Commission recently published the draft of its “Artificial Intelligence Act”. This law would regulate the use of AI within the entire European Union. What role can and should explainable AI play in these regulations? In our recently published article for the FAccT Conference, we approached this question from three different perspectives: machine learning, law, and philosophy.
The relevance of adversarial situations for society
When it comes to applications of machine learning, we need to distinguish between two fundamentally different situations. In a cooperative situation, such as in scientific applications and language translations, the interests of the provider and the user are roughly the same. A programmer, doctor, and patient all want an algorithm that recognizes cancer at an early stage and provides plausible reasons for the diagnosis. In adversarial situations, however, the interests of the AI provider might be different from that of the person subject to the decision. A person applying for a bank loan, for example, wants as much capital at the lowest interest rate possible. The bank, on the other hand, tries to maximize its own profits, and will be less concerned with the transparency of its loan-issuing process. Such adversarial situations are socially relevant because we must weigh diverging interests against each other and try to create a balance between them.
A second relevant point is that the current legal framework for explainable AI is relatively vague. There is no basis at present for us to say: “We expect x, y, and z from explainable AI, and if the algorithm doesn’t meet these expectations, it shouldn’t be applied in a given situation s.” As a society, we have yet to find such conceptual clarity. In adversarial situations, for example, the explanation is supposed to help its recipient do something, for example to object to a decision or to learn to do better in the future. But this presupposes that a given explanation is accurate; wrong explanations can be misleading.
Explanations should not become a smokescreen for deception
In our article, we argue that current explanation algorithms cannot meet our expectations in adversarial situations. The main reason is that there are usually many plausible explanations for a decision made by a complex AI system rather than a single, definitive one. In certain cases, it might be possible to explain the decisions of an AI system in simple ways. For the most part, however, the decisions of a complex system remain most of all one thing: complex. Our hunger for truth, for accurate and conclusive answers, will never be satisfied by simplifying explanations.
In cooperative situations explanations may offer useful insights into the functioning of an AI system, whereas in adversarial situations, such explanations may also have undesirable consequences. Indeed, we show that explanations generated by current AI methods depend on many specific details of the AI system: the training data, the precise form of the decision surface, the choice of this or that explanatory algorithm, etc. Programmers and developers of AI are free to choose these parameters, so there is a danger that while a given explanation might sound plausible, it may ultimately serve to make the AI’s decision uncontestable and its developer untouchable. At this point at least, developers of AI are simply not interested in generating explanations that raise doubts about certain aspects of their AI systems. For these reasons, in adversarial situations algorithmic explanations are of no help and may even lead us astray. In the article, we also show that it doesn’t make much sense to test algorithmic explanations for correctness. At best, we can test internal consistency between explanation and prediction, a kind of weak “sincerity”. But more sophisticated deceptions will remain undetected.
Despite these problems with algorithmic explanations, we should not feel entirely helpless. In our article, we discuss various scenarios in which AI systems can be tested. Before implementing such tests, however, we need to reach consensus as a society about what we actually expect from AI. Only then can we decide which applications of an algorithm should be allowed and which ones forbidden. Instead of relying on explanations as an all-encompassing fix, we should be looking for other solutions that could give us what we were originally looking for in our quest for explanations. One possibility might be to return to algorithms that are intrinsically interpretable. But that is another story…
Bordt, S., Finck, M., Raidl, E. & von Luxburg, U. Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts. 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22), pp. 891–905.