Protein subcellular localization plays a vital role in understanding proteins’ behavior under different circumstances. The effectiveness of various drugs can be assessed by the successful prediction of protein locations. Therefore, it is important to develop a prediction system that is sufficiently reliable and accurate in making decisions regarding the protein localization. However, main problem in developing a reliable and high throughput prediction system is the presence of imbalanced data, which greatly affects the performance of a prediction system. In order to remedy this problem, we utilized the notion of oversampling through Synthetic Minority Oversampling Technique (SMOTE). Further, different feature extraction strategies and ensemble classification techniques are assessed for their contribution toward the solution of the challenging problem of subcellular localization. After applying SMOTE data balancing technique, a remarkable improvement is observed in the performance of random forest and rotation forest ensemble classifiers for CHOM, CHOA and VeroA datasets. It is anticipated that our proposed model might be helpful for the research community in the field of functional and structural proteomics as well as in drug discovery.

Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing / Muhammad Tahir;Asifullah Khan;Abdul Majid;Alessandra Lumini. - In: APPLIED SOFT COMPUTING. - ISSN 1568-4946. - STAMPA. - 13:(2013), pp. 4231-4243. [10.1016/j.asoc.2013.06.027]

Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing

LUMINI, ALESSANDRA
2013

Abstract

Protein subcellular localization plays a vital role in understanding proteins’ behavior under different circumstances. The effectiveness of various drugs can be assessed by the successful prediction of protein locations. Therefore, it is important to develop a prediction system that is sufficiently reliable and accurate in making decisions regarding the protein localization. However, main problem in developing a reliable and high throughput prediction system is the presence of imbalanced data, which greatly affects the performance of a prediction system. In order to remedy this problem, we utilized the notion of oversampling through Synthetic Minority Oversampling Technique (SMOTE). Further, different feature extraction strategies and ensemble classification techniques are assessed for their contribution toward the solution of the challenging problem of subcellular localization. After applying SMOTE data balancing technique, a remarkable improvement is observed in the performance of random forest and rotation forest ensemble classifiers for CHOM, CHOA and VeroA datasets. It is anticipated that our proposed model might be helpful for the research community in the field of functional and structural proteomics as well as in drug discovery.
2013
Subcellular localization using fluorescence imagery: Utilizing ensemble classification with diverse feature extraction strategies and data balancing / Muhammad Tahir;Asifullah Khan;Abdul Majid;Alessandra Lumini. - In: APPLIED SOFT COMPUTING. - ISSN 1568-4946. - STAMPA. - 13:(2013), pp. 4231-4243. [10.1016/j.asoc.2013.06.027]
Muhammad Tahir;Asifullah Khan;Abdul Majid;Alessandra Lumini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/251285
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 18
social impact