In machine learning, understand why a model makes certain decisions is often just as important as whether those decisions are correct. For example, a machine learning model might correctly predict that a skin lesion is cancerous, but it might have done so using an unrelated blip on a clinical photo.
While there are tools to help experts make sense of a model’s reasoning, these methods often only provide insight into one decision at a time, and each must be evaluated manually. Models are typically trained using millions of data inputs, making it nearly impossible for a human to evaluate enough decisions to identify patterns.
Now, researchers from MIT and IBM Research have created a method that allows a user to aggregate, sort, and rank these individual explanations to quickly analyze the behavior of a machine learning model. Their technique, called Shared Interest, incorporates quantifiable measures that compare how well a model’s reasoning matches that of a human.
Shared interest can help a user easily discover trends regarding a model’s decision-making – for example, the model may often be confused by distracting and irrelevant features, like background objects On the photos. Aggregating this information could help the user quickly and quantitatively determine if a model is reliable and ready to deploy in a real situation.
“In developing Shared Interest, our goal is to be able to extend this analysis process so that you can understand at a more holistic level what your model is behaving,” says lead author Angie Boggust, graduate student in the visualization group. from the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Boggust wrote the paper with his advisor, Arvind Satyanarayan, an assistant professor of computer science who leads the visualization group, as well as Benjamin Hoover and lead author Hendrik Strobelt, both of IBM Research. The paper will be presented at the Conference on Human Factors in Computer Systems.
Boggust began working on this project during a summer internship at IBM, under the mentorship of Strobelt. After returning to MIT, Boggust and Satyanarayan developed the project and continued the collaboration with Strobelt and Hoover, who helped roll out the case studies that show how the technique could be used in practice.
Human-AI Alignment
Shared interest leverages popular techniques that show how a machine learning model made a specific decision, called salience methods. If the model classifies images, salience methods highlight areas of an image that are important to the model when making its decision. These areas are visualized as a type of heat map, called a salience map, often overlaid on the original image. If the model classified the image as a dog and the dog’s head is highlighted, that means those pixels were important to the model when it decided the image contained a dog.
Shared interest works by comparing salience methods to ground truth data. In an image dataset, the ground truth data is usually human-generated annotations that surround the relevant parts of each image. In the previous example, the box surrounded the entire dog in the photo. When evaluating an image classification model, Shared Interest compares model-generated saliency data and human-generated ground truth data for the same image to see how well they compare. align.
The technique uses multiple measures to quantify this alignment (or misalignment) and then sorts a particular decision into one of eight categories. The categories run the gamut from perfectly aligned with the human (the model makes a correct prediction and the highlighted area in the salience map is identical to the box generated by the human) to completely distracted (the model makes a incorrect prediction and does not use any image features found in the human-generated box).
“At one end of the spectrum, your model made the decision for the exact same reason as a human, and at the other end of the spectrum, your model and the human are making that decision for entirely different reasons. Quantifying this for all the images in your dataset, you can use this quantization to sort them,” says Boggust.
The technique works the same way with textual data, where keywords are highlighted instead of image regions.
Quick Scan
The researchers used three case studies to show how Shared Interest could be useful to both non-experts and machine learning researchers.
In the first case study, they used Shared Interest to help a dermatologist determine whether to trust a machine learning model designed to help diagnose cancer from photos of skin lesions. The shared interest allowed the dermatologist to quickly see examples of correct and incorrect model predictions. In the end, the dermatologist decided he couldn’t trust the model because it was making too many predictions based on image artifacts rather than actual lesions.
“The value here is that by using Shared Interest, we are able to see these patterns emerge in our model’s behavior. In about half an hour, the dermatologist was able to confidently decide whether or not to trust the model. and deploy it or not,” says Boggust.
In the second case study, they worked with a machine learning researcher to show how Shared Interest can assess a particular salience method by revealing previously unknown pitfalls in the model. Their technique allowed the researcher to analyze thousands of correct and incorrect decisions in a fraction of the time required by typical manual methods.
In the third case study, they used Shared Interest to dig deeper into a specific image classification example. By manipulating the ground truth zone of the image, they were able to perform simulation analysis to see which features of the image were most important for particular predictions.
The researchers were impressed with the performance of Shared Interest in these case studies, but Boggust cautions that the technique is only as good as the salience methods on which it is based. If these techniques contain biases or are inaccurate, Shared Interest will inherit these limitations.
In the future, researchers want to apply shared interest to different types of data, especially tabular data used in medical records. They also want to use the shared interest to help improve current salience techniques. Boggust hopes this research will inspire more work to quantify the behavior of machine learning models in a way that makes sense to humans.
This work is funded, in part, by the MIT-IBM Watson AI Lab, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.