Modified scaled exponential linear unit

Activation functions assume a crucial role in elucidating the intricacies of training dynamics and the overall performance of neural networks. Despite its simplicity and effectiveness, the ubiquitously embraced ReLU activation function harbors certain drawbacks, notably the predicament recognized as the “Dying ReLU” issue. To address such challenges, we propose the introduction of a pioneering activation function, the modified scaled exponential linear unit (M-SELU). Drawing from an array of experiments conducted across diverse computer vision tasks employing cutting-edge architectures, it becomes apparent that M-SELU exhibits superior performance compared to ReLU (used as the baseline) and various other activation functions. The simplicity of the proposed activation function (M-SELU) makes this solution particularly suitable for multi-layered deep neural architecture, including applications in CNN, CIFAR-10, and the broader field of deep learning.

Keywords

activation functions; CNN; CIFAR-10; deep learning; modified scaled exponential linear unit (M-SELU)

References

Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Cornell University; 2021.
Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. ScienceMag.org; 2015.
Langley P, Herbert A, Simon. Applications of Machine Learning and Rule Induction. Advances in Traditional AI. 1995.
Deng L, Liu Y, eds. Deep Learning in Natural Language Processing. Springer Singapore; 2018. doi: 10.1007/978-981-10-5209-5
Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine. 2001; 23(1): 89-109.
Wang J, Hong Y, Wang J, et al. Cooperative and Competitive Multi-Agent Systems: From Optimization to Games. IEEE/CAA Journal of Automatica Sinica. 2022; 9(5): 763-783. doi: 10.1109/jas.2022.105506
Rahman JU, Danish S, Lu D. Oscillator Simulation with Deep Neural Networks. Mathematics. 2024; 12(7): 959. doi: 10.3390/math12070959
Sharma S, Sharma S, Athaiya A. Activation functions in neural networks. International Journal of Engineering Applied Sciences and Technology. 2020; 04(12): 310-316. doi: 10.33564/ijeast.2020.v04i12.054
Abien Fred M, Agarap. Deep Learning using Rectified Linear Units (ReLU). Cornell University; 2019.
Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data. 2021; 8(1). doi: 10.1186/s40537-021-00444-8
Puja B, Ankita P. Deep learning techniques—R-CNN to mask R-CNN: a survey. In: Proceeding of the Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019; 2020.
Mahbub H, Jordan J. Bird, Diego R. Faria. A study on CNN transfer learning for image classification. In: Advances in Computational Intelligence Systems: Contributions Presented at the 18th UK Workshop on Computational Intelligence (CVC). Springer International Publishing; 2019.
Sharma N, Jain V, Mishra A. An Analysis of Convolutional Neural Networks for Image Classification. Procedia Computer Science. 2018; 132: 377-384. doi: 10.1016/j.procs.2018.05.198
Ul Rahman J, Chen Q, Yang Z. Additive Parameter for Deep Face Recognition. Communications in Mathematics and Statistics. 2019; 8(2): 203-217. doi: 10.1007/s40304-019-00198-z
Shiv RD, Satish KS, Bidyut BC. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing. 2022.
Andrinandrasana DR, Fouzia A, Peter S. A review of activation function for artificial neural network. In: Proceeding of the 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI).
Ul Rahman J, Rubiqa Z, Asad K. SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance. Cornell University; 2024.
Ul Rahman J, Makhdoom F, Lu D. Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently. Cornell University; 2023.
Ul Rahman J, Makhdoom F, Ali A, et al. Mathematical modeling and simulation of biophysics systems using neural network. International Journal of Modern Physics B. 2023; 38(05). doi: 10.1142/s0217979224500668
Tian Y. Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm. IEEE Access. 2020; 8: 125731-125744. doi: 10.1109/access.2020.3006097
Meiyin Wu, Li Chen. Image recognition based on deep learning. 2015 Chinese Automation Congress (CAC). 2015; 32: 542-546. doi: 10.1109/cac.2015.7382560
Olsson F. A literature survey of active machine learning in the context of natural language processing. ResearchGate; 2009.
Larabi Marie-Sainte S, Alalyani N, Alotaibi S, et al. Arabic Natural Language Processing and Machine Learning-Based Systems. IEEE Access. 2019; 7: 7011-7020. doi: 10.1109/access.2018.2890076
Brody H. Deep learning for highway driving. Cornell University; 2015.
Kallenberg M, Petersen K, Nielsen M, et al. Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring. IEEE Transactions on Medical Imaging. 2016; 35(5): 1322-1331. doi: 10.1109/tmi.2016.2532122
Awni H. Deep speech: Scaling up end-to-end speech recognition. Cornell University; 2014.
Wang Z, She Q, Ward TE. Generative Adversarial Networks in Computer Vision. ACM Computing Surveys. 2021; 54(2): 1-38. doi: 10.1145/3439723
Tom Michael Mitchell. The discipline of machine learning. Carnegie Mellon University; 2006.
Voulodimos A, Doulamis N, Doulamis A, et al. Deep Learning for Computer Vision: A Brief Review. Computational Intelligence and Neuroscience. 2018; 2018: 1-13. doi: 10.1155/2018/7068349
O’Mahony N. Deep learning vs. traditional computer vision. In: Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference. Springer International Publishing; 2020.
Tan H, Lim HK. Vanishing gradient mitigation with deep learning neural network optimization. In: Proceeding of the 2019 7th International Conference on Smart Computing & Communications (ICSCC).
Hanin B. Which neural net architectures give rise to exploding and vanishing gradients? Advances in Neural Information Processing Systems. 2018.
Roodschild M, Gotay Sardiñas J, Will A. A new approach for the vanishing gradient problem on sigmoid activation. Progress in Artificial Intelligence. 2020; 9(4): 351-360. doi: 10.1007/s13748-020-00218-y
Hu Y. Overcoming the vanishing gradient problem in plain recurrent networks. Coenell University; 2018.
Le P, Zuidema W. Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs. Coenell University; 2016.
Lu L. Dying ReLU and initialization: Theory and numerical examples. Coenell University; 2019.
Shin Y, Karniadakis GE. Trainability of relu networks and data-dependent initialization. Journal of Machine Learning for Modeling and Computing. 2020; 1(1): 39-74. doi: 10.1615/jmachlearnmodelcomput.2020034126
Ul Rahman J, Danish S, Lu D. Deep Neural Network-Based Simulation of Sel’kov Model in Glycolysis: A Comprehensive Analysis. Mathematics. 2023; 11(14): 3216. doi: 10.3390/math11143216
Wang X, Qin Y, Wang Y, et al. ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing. 2019; 363: 88-98. doi: 10.1016/j.neucom.2019.07.017
Lee H, Kim Y, Yang SY, et al. Improved weight initialization for deep and narrow feedforward neural network. Neural Networks. 2024; 176: 106362. doi: 10.1016/j.neunet.2024.106362
Hochreiter S. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 1998; 06(02): 107-116. doi: 10.1142/s0218488598000094
Hu Z, Zhang J, Ge Y. Handling Vanishing Gradient Problem Using Artificial Derivative. IEEE Access. 2021; 9: 22371-22377. doi: 10.1109/access.2021.3054915
Daniel AL, Jennifer MN. Comparing activation functions for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2016; 27(11): 2207-2216.
Mycroft GK. Activation functions in machine learning algorithms. Journal of Machine Learning Research. 2020; 21: 1-24.
Schulman J. Proximal Policy Optimization Algorithms. Cornell University; 2017.
Schaul T. Noisy Networks for Exploration. Cornell University; 2017.
Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference on Computer Vision. Springer International Publishing; 2014.
Springenberg JT, Dosovitskiy A, Brox T, et al. Striving for simplicity: The all convolutional net. Cornell University; 2014.
Huff DT, Weisman AJ, Jeraj R. Interpretation and visualization techniques for deep learning models in medical imaging. Physics in Medicine & Biology. 2021; 66(4): 04TR01. doi: 10.1088/1361-6560/abcd17
Ul Rahman J. DiffGrad for Physics-Informed Neural Networks. Cornell University; 2024.

Supporting Agencies

Editor-in-Chief

Prof. Youssri Hassan Youssri

Cairo University, Egypt

Indexing & Archiving

News & Announcements

2025-05-08

Prof. Rami Ahmad El-Nabulsi is highly deserving of a Lifetime Achievement Award

Prof. Rami Ahmad El-Nabulsi, Researcher at Czech Academy of Science, Czech Republic, received the Lifetime Achievement Award!

2025-03-01

Publication Frequency of MSS changes to be quarterly!

We are pleased to announce that, effective from 2025, the publication frequency of this journal will be adjusted to a quarterly schedule, with four issues released annually in March, June, September, and December....

2024-09-20

Highly Read Article Recommendation

Since the journal launched in 2023, we have been proud to publish a plethora of insightful findings in the realms of mathematics and systems science....

2024-07-10

Meeting our Editor-in-Chief and Associate Editor

It is with great pride that we introduce Prof. Youssri Hassan Youssri and Prof. Ali Akgül, who are the Editor-in-Chief and Associate Editor of our esteemed journal.

More Announcements...

Member Application

Apply to be Reviewer

Apply to be EBM

Journal Center

Asia Pacific Academy of Science Pte. Ltd. (APACSCI) specializes in international journal publishing. APACSCI adopts the open access publishing model and provides an important communication bridge for academic groups whose interest fields include engineering, technology, medicine, computer, mathematics, agriculture and forestry, and environment.

more