Modified scaled exponential linear unit

Nimra Nimra, Jamshaid Ul Rahman, Dianchen Lu

Article ID: 2870
Vol 2, Issue 2, 2024
DOI: https://doi.org/10.54517/mss.v2i2.2870
Received: 5 August 2024; Accepted: 24 September 2024; Available online: 8 October 2024;
Issue release: 15 November 2024

VIEWS - 538 (Abstract)

Download PDF

Abstract

Activation functions assume a crucial role in elucidating the intricacies of training dynamics and the overall performance of neural networks. Despite its simplicity and effectiveness, the ubiquitously embraced ReLU activation function harbors certain drawbacks, notably the predicament recognized as the “Dying ReLU” issue. To address such challenges, we propose the introduction of a pioneering activation function, the modified scaled exponential linear unit (M-SELU). Drawing from an array of experiments conducted across diverse computer vision tasks employing cutting-edge architectures, it becomes apparent that M-SELU exhibits superior performance compared to ReLU (used as the baseline) and various other activation functions. The simplicity of the proposed activation function (M-SELU) makes this solution particularly suitable for multi-layered deep neural architecture, including applications in CNN, CIFAR-10, and the broader field of deep learning.


Keywords

activation functions; CNN; CIFAR-10; deep learning; modified scaled exponential linear unit (M-SELU)


References

1. Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Cornell University; 2021.

2. Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. ScienceMag.org; 2015.

3. Langley P, Herbert A, Simon. Applications of Machine Learning and Rule Induction. Advances in Traditional AI. 1995.

4. Deng L, Liu Y, eds. Deep Learning in Natural Language Processing. Springer Singapore; 2018. doi: 10.1007/978-981-10-5209-5

5. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine. 2001; 23(1): 89-109.

6. Wang J, Hong Y, Wang J, et al. Cooperative and Competitive Multi-Agent Systems: From Optimization to Games. IEEE/CAA Journal of Automatica Sinica. 2022; 9(5): 763-783. doi: 10.1109/jas.2022.105506

7. Rahman JU, Danish S, Lu D. Oscillator Simulation with Deep Neural Networks. Mathematics. 2024; 12(7): 959. doi: 10.3390/math12070959

8. Sharma S, Sharma S, Athaiya A. Activation functions in neural networks. International Journal of Engineering Applied Sciences and Technology. 2020; 04(12): 310-316. doi: 10.33564/ijeast.2020.v04i12.054

9. Abien Fred M, Agarap. Deep Learning using Rectified Linear Units (ReLU). Cornell University; 2019.

10. Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data. 2021; 8(1). doi: 10.1186/s40537-021-00444-8

11. Puja B, Ankita P. Deep learning techniques—R-CNN to mask R-CNN: a survey. In: Proceeding of the Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019; 2020.

12. Mahbub H, Jordan J. Bird, Diego R. Faria. A study on CNN transfer learning for image classification. In: Advances in Computational Intelligence Systems: Contributions Presented at the 18th UK Workshop on Computational Intelligence (CVC). Springer International Publishing; 2019.

13. Sharma N, Jain V, Mishra A. An Analysis of Convolutional Neural Networks for Image Classification. Procedia Computer Science. 2018; 132: 377-384. doi: 10.1016/j.procs.2018.05.198

14. Ul Rahman J, Chen Q, Yang Z. Additive Parameter for Deep Face Recognition. Communications in Mathematics and Statistics. 2019; 8(2): 203-217. doi: 10.1007/s40304-019-00198-z

15. Shiv RD, Satish KS, Bidyut BC. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing. 2022.

16. Andrinandrasana DR, Fouzia A, Peter S. A review of activation function for artificial neural network. In: Proceeding of the 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI).

17. Ul Rahman J, Rubiqa Z, Asad K. SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance. Cornell University; 2024.

18. Ul Rahman J, Makhdoom F, Lu D. Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently. Cornell University; 2023.

19. Ul Rahman J, Makhdoom F, Ali A, et al. Mathematical modeling and simulation of biophysics systems using neural network. International Journal of Modern Physics B. 2023; 38(05). doi: 10.1142/s0217979224500668

20. Tian Y. Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm. IEEE Access. 2020; 8: 125731-125744. doi: 10.1109/access.2020.3006097

21. Meiyin Wu, Li Chen. Image recognition based on deep learning. 2015 Chinese Automation Congress (CAC). 2015; 32: 542-546. doi: 10.1109/cac.2015.7382560

22. Olsson F. A literature survey of active machine learning in the context of natural language processing. ResearchGate; 2009.

23. Larabi Marie-Sainte S, Alalyani N, Alotaibi S, et al. Arabic Natural Language Processing and Machine Learning-Based Systems. IEEE Access. 2019; 7: 7011-7020. doi: 10.1109/access.2018.2890076

24. Brody H. Deep learning for highway driving. Cornell University; 2015.

25. Kallenberg M, Petersen K, Nielsen M, et al. Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring. IEEE Transactions on Medical Imaging. 2016; 35(5): 1322-1331. doi: 10.1109/tmi.2016.2532122

26. Awni H. Deep speech: Scaling up end-to-end speech recognition. Cornell University; 2014.

27. Wang Z, She Q, Ward TE. Generative Adversarial Networks in Computer Vision. ACM Computing Surveys. 2021; 54(2): 1-38. doi: 10.1145/3439723

28. Tom Michael Mitchell. The discipline of machine learning. Carnegie Mellon University; 2006.

29. Voulodimos A, Doulamis N, Doulamis A, et al. Deep Learning for Computer Vision: A Brief Review. Computational Intelligence and Neuroscience. 2018; 2018: 1-13. doi: 10.1155/2018/7068349

30. O’Mahony N. Deep learning vs. traditional computer vision. In: Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference. Springer International Publishing; 2020.

31. Tan H, Lim HK. Vanishing gradient mitigation with deep learning neural network optimization. In: Proceeding of the 2019 7th International Conference on Smart Computing & Communications (ICSCC).

32. Hanin B. Which neural net architectures give rise to exploding and vanishing gradients? Advances in Neural Information Processing Systems. 2018.

33. Roodschild M, Gotay Sardiñas J, Will A. A new approach for the vanishing gradient problem on sigmoid activation. Progress in Artificial Intelligence. 2020; 9(4): 351-360. doi: 10.1007/s13748-020-00218-y

34. Hu Y. Overcoming the vanishing gradient problem in plain recurrent networks. Coenell University; 2018.

35. Le P, Zuidema W. Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs. Coenell University; 2016.

36. Lu L. Dying ReLU and initialization: Theory and numerical examples. Coenell University; 2019.

37. Shin Y, Karniadakis GE. Trainability of relu networks and data-dependent initialization. Journal of Machine Learning for Modeling and Computing. 2020; 1(1): 39-74. doi: 10.1615/jmachlearnmodelcomput.2020034126

38. Ul Rahman J, Danish S, Lu D. Deep Neural Network-Based Simulation of Sel’kov Model in Glycolysis: A Comprehensive Analysis. Mathematics. 2023; 11(14): 3216. doi: 10.3390/math11143216

39. Wang X, Qin Y, Wang Y, et al. ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing. 2019; 363: 88-98. doi: 10.1016/j.neucom.2019.07.017

40. Lee H, Kim Y, Yang SY, et al. Improved weight initialization for deep and narrow feedforward neural network. Neural Networks. 2024; 176: 106362. doi: 10.1016/j.neunet.2024.106362

41. Hochreiter S. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 1998; 06(02): 107-116. doi: 10.1142/s0218488598000094

42. Hu Z, Zhang J, Ge Y. Handling Vanishing Gradient Problem Using Artificial Derivative. IEEE Access. 2021; 9: 22371-22377. doi: 10.1109/access.2021.3054915

43. Daniel AL, Jennifer MN. Comparing activation functions for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2016; 27(11): 2207-2216.

44. Mycroft GK. Activation functions in machine learning algorithms. Journal of Machine Learning Research. 2020; 21: 1-24.

45. Schulman J. Proximal Policy Optimization Algorithms. Cornell University; 2017.

46. Schaul T. Noisy Networks for Exploration. Cornell University; 2017.

47. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference on Computer Vision. Springer International Publishing; 2014.

48. Springenberg JT, Dosovitskiy A, Brox T, et al. Striving for simplicity: The all convolutional net. Cornell University; 2014.

49. Huff DT, Weisman AJ, Jeraj R. Interpretation and visualization techniques for deep learning models in medical imaging. Physics in Medicine & Biology. 2021; 66(4): 04TR01. doi: 10.1088/1361-6560/abcd17

50. Ul Rahman J. DiffGrad for Physics-Informed Neural Networks. Cornell University; 2024.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Nimra, Jamshaid Ul Rahman, Dianchen Lu

License URL: https://creativecommons.org/licenses/by/4.0/