ES-PredHSP: Improved Prediction of Heat Shock Proteins Using Machine Learning by Enhanced Sampling Technique

Muhammad Yasir Akbar, Humera Azad, Maliha Rashid, Wajya Ajmal, Adnan Ahmad Ansari, Mohamed M. Salem, Ram Kumar Sahu, Mounir M. Salem-Bekhit, Shakira Ghazanfar

Article ID: 7784
Vol 38, Issue 1, 2024
DOI: https://doi.org/10.23812/j.biol.regul.homeost.agents.20243801.55
Received: 20 January 2024; Accepted: 20 January 2024; Available online: 20 January 2024; Issue release: 20 January 2024

Abstract

Background: Heat shock proteins (HSPs) are essential for the growth of various cells. The development of automated and precise machine-learning tools for the quick prediction of HSPs is significant because conventional methods are expensive, and there is abundant protein sequence information accessible in the post-genomic era, which can be easily used to develop machine-learning based tools. Methods: The proposed method utilized the already available dataset from the PredHSP tool. The Composition of k Spaced Amino Acid Pair feature was calculated using iFeature, and an enhanced sampling technique was proposed for the balanced dataset. Six machine learning models were developed to predict HSPs, and their robustness was assessed using a ten-fold cross-validation technique. The best model was finalized among six machine learning models and evaluated by the following metrics: accuracy, precision, recall, f1 score, and area under the curve (AUC). The command line utility for the use was available as a GitHub repository. Results: Six machine learning models were assessed with ten-fold cross-validation, and support vector machine (SVM) outperformed with a higher overall accuracy (87%) compared to existing methods, which can predict all HSP types in one run. For usage, the model was deployed as a command line utility in GitHub. Conclusions: Machine learning is a powerful method to predict the HSPs by identifying hidden patterns inside the sequence. HSPs are important chaperons in research, and their quick prediction will function in many important aspects of biological research. This model helps to predict the heat shock protein in eukaryotes.


Keywords

machine learning;heat shock protein;PredHSP;enhanced sampling


References

Supporting Agencies



Copyright (c) 2024 Muhammad Yasir Akbar, Humera Azad, Maliha Rashid, Wajya Ajmal, Adnan Ahmad Ansari, Mohamed M. Salem, Ram Kumar Sahu, Mounir M. Salem-Bekhit, Shakira Ghazanfar




This site is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).