Turkish Journal of Electrical Engineering and Computer Sciences




It is necessary to develop an explainable model to clarify how and why a medical model makes a particular decision. Local posthoc explainable AI (XAI) techniques, such as SHAP and LIME, interpret classification system predictions by displaying the most important features and rules underlying any prediction locally. Therefore, in order to compare two or more XAI methods, they must first be evaluated qualitatively or quantitatively. This paper proposes quantitative XAI evaluation metrics that are not based on biased and subjective human judgment. On the other hand, it is dependent on the depth of the decision tree (DT) to automatically and effectively measure the complexity of XAI methods. Our study introduces a novel XAI strategy that measures the complexity of any XAI method by using a characteristic of another model as a proxy. The output of XAI methods, specifically feature importance scores from SHAP and LIME, is fed into the DT in our proposal. The DT will then draw a full tree based on the feature importance score decisions. As a result, we developed two main metrics that can be used to assess the DT?s complexity and thus the associated XAI method: the total depth of the tree (TDT) and the average of the weighted class depth (ACD). The results show that SHAP outperforms LIME and is thus less complex. Furthermore, in terms of the number of documents and features, SHAP is more scalable. These results can indicate whether a specific XAI method is suitable for dealing with different document scales. Furthermore, they can demonstrate which features can be used to improve the performance of the black-box model, in this case, a feedforward neural network (FNN).


Explainable AI, medical multiclass classification, SHAP, LIME, decision tree, quantitative explainability evaluation

First Page


Last Page