Explainable AI (XAI)

Understand the background and key concepts of Explainable AI.

A Case Study of A Black-Box Criminal Risk Prediction

COMPAS, a risk assessment tool used to determine the likelihood that someone will reoffend, has been shown to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants, and mislabel white defendants as low risk more often than black defendants. This tool has raised questions about what data informs risk assessment scores, how the tool determines risk assessment scores, and how it should be used in the criminal justice system.

These questions require AI to be explained to humans in a way that is usable, understandable, and practical.

Why Explainability is Important


The problem of explainability is not new.

Explainable AI (XAI) is a research field that studies how AI decisions and data driving those decisions can be explained to people in order to provide transparency, enable assessment of accountability, demonstrate fairness, or facilitate understanding. While the term “XAI” was first coined in 2004 by Van Lent et al., the problem of explainability has existed over the past decades with expert systems in the mid-1970s, Bayesian networks and artificial neural networks in the 1980s, and recommender systems in the 2000s.

There has been a recent surge in XAI given the popularity of complex black box algorithms, which are often unintelligible even to technical experts, as well as the increasing use of AI to make decisions in high-stakes scenarios. Explanations have the potential to lighten the black-box of ML, helping people understand these systems and develop better partnerships with them.

AI-Assisted Decision Making

AI-assisted decision making refers to scenarios wherein the individual strengths of a person and the AI are complementary and come together to optimize the joint decision outcome. These scenarios are often high-stakes, such as medical diagnosis, law enforcement, and financial investment. 

  • Medical Diagnosis
    Law Enforcement
    Financial Investment
Examples of high-stakes scenarios using AI to assist decision making.

While AI systems can perform impressively, full delegation is not desired in these high stakes situations because the probabilistic nature of AI means there is never a guarantee for correctness for a particular decision. There is also concern about the potential of “adversarial attacks”, which are manipulations that can change the behavior of AI systems. For example, by changing a few pixels on a lung scan, someone can fool an AI system into seeing an illness that is not really there or not seeing one that is.

As a result, the key to success in these human-AI partnerships is calibrating trust on a case-by-case basis, requiring the person to know when to trust the AI prediction and when to use their own judgement in order to improve decision outcomes in cases where the model is likely to perform poorly.

Trust Calibration

In AI-assisted decision making, the human moves from being the primary decision maker to being an active teammate. These AI decision aids are often modeled as partners rather than as tools. The key to success in these partnerships is trust calibration.

Trust is an attitude that one’s vulnerabilities will not be exploited in a situation of uncertainty and risk. Trust calibration is the correspondence between a person’s trust in AI and capabilities of AI, and when mismatched, results in misuse or disuse. Misuse occurs when people rely uncritically on AI because of a perception that AI performs perfectly all the time or a failure to monitor and evaluate system performance. Disuse occurs when people reject the capabilities of AI due to various factors such as a violation of trust.

  • Overtrust, when trust exceeds the system’s capabilities, leads to misuse. Misuse refers to failures that occur when people inadvertently violate critical assumptions and trust AI when they shouldn’t.
    Distrust, when trust is less than the system’s capabilities, leads to disuse. Disuse refers to failures that occur when people reject the capabilities of AI, failing to use it when they should.

Explanations can facilitate people’s understanding of AI systems and help calibrate trust, providing a more effective human-in-the-loop workflow. The effectiveness of explanations is contingent on the user and context, requiring a closer look at when and how to explain.

Relevant Industry Resources

Explainability is emphasized, explicitly or implicitly, in AI design guidelines and resources from companies such as Google, IBM, and Microsoft. We recommend reviewing these guidelines to learn more about how companies are designing for AI and incorporating explainability into their products and services.

  • Google’s People+AI Research (PAIR) Guidebook features a section on Explainability and Trust that discusses how to explain the AI system and if, when, and how to show model confidence. 

  • IBM’s Design for AI is a comprehensive collection of ethics, guidelines, and resource that provides recommended actions, considerations, questions, and an example for explainability.

  • Microsoft’s Guidelines for Human-AI Interaction recommends best practices for how AI systems should behave upon initial interaction, during regular interaction, when they’re inevitably wrong, and over time.


  • Alexander, V., Blinder, C., Zak, P. J. (2018). Why trust an algorithm? Performance, cognition, and neurophysiology. Comput. Hum. Behav., 89, 279-288.
  • Amershi, S., Inkpen, K., Teevan, J., Kikin-Gil, R., Horvitz, E., Weld, D., … Bennett, P. N. (2019). Guidelines for Human-AI Interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI 19.
  • Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine Bias.
  • Barocas, S., Friedler, S., Hardt, M., Kroll, J., Venka-Tasubramanian, S., & Wallach, H. The FAT-ML Workshop Series on Fairness, Accountability, and Transparency in Machine Learning. 
  • Bucher, T. (2017). The algorithmic imaginary: exploring the ordinary affects of Facebook algorithms. Information, Communication & Society, 20, 30 - 44.
  • Corritore, C., Kracher, B., & Wiedenbeck, S. (2003) On-line trust: concepts, evolving themes, a model. International Journal of Human-Computer Studies 58. 
  • Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80.
  • Madhavan, P. & Wiegmann, D. A. (2007) Similarities and differences between human-human and human-automation trust: an integrative review. In Theoretical Issues in Ergonomics Science, 8:4, 277-301.
  • Parasuraman, R., & Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2), 230–253.
  • Parasuraman, R., Sheridan, T.B., & Wickens, C.D. (2000). A model for types and levels of human interaction with automation. IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society, 30 3, 286-97.
  • Ribeiro, M.T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. KDD '16.
  • Sokol, K., & Flach, P.A. (2020). Explainability fact sheets: a framework for systematic assessment of explainable approaches. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.