Motivation: Anatomical Therapeutic Chemical (ATC) classification of unknown compound has raised high significance for both drug development and basic research. The ATC system is a multi-label classification system proposed by the World Health Organization (WHO), which categorizes drugs into classes according to their therapeutic effects and characteristics. This system comprises five levels and includes several classes in each level; the first level includes 14 main overlapping classes. The ATC classification system simultaneously considers anatomical distribution, therapeutic effects, and chemical characteristics: the prediction for an unknown compound of its ATC classes is an essential problem, since such a prediction could be used to deduce not only a compound’s possible active ingredients but also its therapeutic, pharmacological, and chemical properties. Nevertheless, the problem of automatic prediction is very challenging due to the high variability of the samples and the presence of overlapping among classes, resulting in multiple predictions and making machine learning extremely difficult. Results: In this paper, we propose a multi-label classifier system based on deep learned features to infer the ATC classification. The system is based on a 2D representation of the samples: first a 1D feature vector is obtained extracting information about a compound’s chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes, then the original 1D feature vector is reshaped to obtain a 2D matrix representation of the compound. Finally a CNN network is trained and used as feature extractor. Two general purpose classifiers designed for multi-label classification are trained using the deep learned features and resulting scores are fused by the average rule. Experimental evaluation based on rigorous cross-validation demonstrates the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem. Matlab code will be available at https://github.com/LorisNanni
Alessandra Lumini, Loris Nanni (2018). Convolutional Neural Networks for ATC Classification. CURRENT PHARMACEUTICAL DESIGN, 24(34), 4007-4012 [10.2174/1381612824666181112113438].
Convolutional Neural Networks for ATC Classification
Alessandra Lumini;
2018
Abstract
Motivation: Anatomical Therapeutic Chemical (ATC) classification of unknown compound has raised high significance for both drug development and basic research. The ATC system is a multi-label classification system proposed by the World Health Organization (WHO), which categorizes drugs into classes according to their therapeutic effects and characteristics. This system comprises five levels and includes several classes in each level; the first level includes 14 main overlapping classes. The ATC classification system simultaneously considers anatomical distribution, therapeutic effects, and chemical characteristics: the prediction for an unknown compound of its ATC classes is an essential problem, since such a prediction could be used to deduce not only a compound’s possible active ingredients but also its therapeutic, pharmacological, and chemical properties. Nevertheless, the problem of automatic prediction is very challenging due to the high variability of the samples and the presence of overlapping among classes, resulting in multiple predictions and making machine learning extremely difficult. Results: In this paper, we propose a multi-label classifier system based on deep learned features to infer the ATC classification. The system is based on a 2D representation of the samples: first a 1D feature vector is obtained extracting information about a compound’s chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes, then the original 1D feature vector is reshaped to obtain a 2D matrix representation of the compound. Finally a CNN network is trained and used as feature extractor. Two general purpose classifiers designed for multi-label classification are trained using the deep learned features and resulting scores are fused by the average rule. Experimental evaluation based on rigorous cross-validation demonstrates the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem. Matlab code will be available at https://github.com/LorisNanniI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.