Automated classification of BI-RADS in textual mammography reports


Abstract: The main purpose of this paper is to process key information in medical text records and also classifypatients, per different levels of breast imaging-reporting and data system (BI-RADS). The BI-RADS is a scheme for thestandardization of breast imaging reports. Therefore, medical text mining is employed to classify mammography reportssupported BI-RADS. In this research, a new method is proposed for automated BI-RADS classifications extraction fromtextual reports and improves the therapeutic procedures. At first, a mammography lexicon is employed for choosingkeywords from medical text reports. Word2vec and term frequency inverse document frequency (TFIDF) techniques areused for extracting features, finally, they are combined with the hospital information system (HIS) reports and calledWith-HIS. The different classifiers like multiclass support vector machine (SVM), naïve Bayesian (NB), extreme gradientboosting (XGBoost), and multilevel fuzzy min-max neural network (MLF) are used so as to compare the accuracy ofWith-HIS and without HIS (called Without-HIS). The results are confirmed that using HIS beside the proposed approach(Word2vec +TFIDF) encompasses a significant effect on the accuracy of medical text classification. Accuracy within theproposed method with MLF classifier is 0.89% but Without-HIS is 0.85%.

Keywords: Breast cancer, patient follow-up, text classification, feature extraction, word2vec

Full Text: PDF