IBM Releases AI Toolkit for Deep Learning Uncertainties

Deep learning is smart — show off smart. It loves connecting dots no one else can see and being the smartest one in the room. But that’s when deep learning can go wrong – when it thinks it knows everything. What deep learning needs is a touch of humility, to not just be smart but also wise enough to know its limits.

IBM today released an open-source toolkit that addresses what the company calls “deep learning model over confidence” with a tool that addresses AI uncertainty and supports the larger need to make AI more transparent and accountable.

Released at the 2021 IBM Data & AI Digital Developer Conference, Uncertainty Quantification 360 (UQ360) is aimed at giving data scientists and developers algorithms to streamline quantifying, evaluating, improving and communicating uncertainty of machine learning models. In a blog announcing the product, IBM AI researchers Prasanna Sattigeri  and Q. Vera Liao emphasized UQ360’s range of capabilities.

“Common explainability techniques shed light on how AI works,” they said, “but UQ exposes limits and potential failure points.

“We provide a taxonomy and guidance for choosing these capabilities based on your needs… UQ360 is not just a Python package. We developed it with the hope of making it a universal platform for transparently communicating uncertainties and limitations of AI. For that, we have created an interactive experience that provides a gentle introduction to producing high-quality UQ and ways to use UQ in a house price prediction application. We’ve also created a number of in-depth tutorials to demonstrate how to utilize UQ across the AI lifecycle.”

Using what IBM calls “holistic model explanations,” UQ360 is designed to enable human intervention in automated systems. The company explained that supervised ML involves learning a functional mapping between inputs (features) and outputs (predictions/recommendations/responses) from a set of training examples comprising input and output pairs. But it’s “the learned function,” when trouble can arise. This is when “the model predicts outputs for new instances or inputs not seen during training.” These outputs may be real values in the case of a regression model, or categorical class labels in the case of a classification model. In this process, uncertainty can emerge when available data is inherently “noisy” with variability in the data instances and targets; and when the model mapping function is ambiguous.

The UQ360 toolkit provides algorithms to estimate different types of uncertainties, IBM said. Depending on model type and the stage of model development, different UQ algorithms should be applied. UQ360 provides 11 UQ algorithms, and guidance on choosing UQ algorithms to help find the appropriate one for a given use case, the company said.

source: IBM

Sattiferi and Liao said UQ360 makes communication part of development choices in an AI lifecycle. “For every UQ algorithm provided in the UQ360 Python package, a user can make a choice of an appropriate style of communication by following our psychology-based guidance on communicating UQ estimates, from concise descriptions to detailed visualizations.  “

They added that UQ360 is open-sourced “to help create a community of practice for researchers, data scientists and other practitioners that need to understand or communicate the limitations of algorithmic decisions.”

Where does deep learning overconfidence become “safety-critical for AI to express uncertainty,” to use the IBM researchers’ words? One example: sepsis diagnosis. “Early detection of sepsis is important, and AI can help – but only when AI predictions are accompanied by meaningful uncertainty estimates. Only then can doctors immediately treat patients (whom) AI has confidently flagged as at risk and prescribe additional diagnostics for those AI has expressed a low level of certainty about. If the model produces unreliable uncertainty estimates, patients may die.”

Another scenario: a machine learning model used by a product manager may predict new feature A will perform better than new feature B on average, “but to see its worst-case effects on KPIs, the manager would also need to know the margin of error in the predictions.”