
Asia Pacific Academy of Science Pte. Ltd. (APACSCI) specializes in international journal publishing. APACSCI adopts the open access publishing model and provides an important communication bridge for academic groups whose interest fields include engineering, technology, medicine, computer, mathematics, agriculture and forestry, and environment.

ES-PredHSP: Improved Prediction of Heat Shock Proteins Using Machine Learning by Enhanced Sampling Technique
Vol 38, Issue 1, 2024
Abstract
Background: Heat shock proteins (HSPs) are essential for the growth of various cells. The development of automated and precise machine-learning tools for the quick prediction of HSPs is significant because conventional methods are expensive, and there is abundant protein sequence information accessible in the post-genomic era, which can be easily used to develop machine-learning based tools. Methods: The proposed method utilized the already available dataset from the PredHSP tool. The Composition of k Spaced Amino Acid Pair feature was calculated using iFeature, and an enhanced sampling technique was proposed for the balanced dataset. Six machine learning models were developed to predict HSPs, and their robustness was assessed using a ten-fold cross-validation technique. The best model was finalized among six machine learning models and evaluated by the following metrics: accuracy, precision, recall, f1 score, and area under the curve (AUC). The command line utility for the use was available as a GitHub repository. Results: Six machine learning models were assessed with ten-fold cross-validation, and support vector machine (SVM) outperformed with a higher overall accuracy (87%) compared to existing methods, which can predict all HSP types in one run. For usage, the model was deployed as a command line utility in GitHub. Conclusions: Machine learning is a powerful method to predict the HSPs by identifying hidden patterns inside the sequence. HSPs are important chaperons in research, and their quick prediction will function in many important aspects of biological research. This model helps to predict the heat shock protein in eukaryotes.
Keywords
References
Supporting Agencies
Copyright (c) 2024 Muhammad Yasir Akbar, Humera Azad, Maliha Rashid, Wajya Ajmal, Adnan Ahmad Ansari, Mohamed M. Salem, Ram Kumar Sahu, Mounir M. Salem-Bekhit, Shakira Ghazanfar
This site is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Medical Genetics, University of Torino Medical School, Italy

Department of Biomedical, Surgical and Dental Sciences, University of Milan, Italy