[1] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, "Joint learning of words and meaning representations for open-text semantic parsing," in Artificial Intelligence and Statistics, 2012, pp. 127-135.
[2] W. Ma, W. Ma, S. Xu, and H. Zha, "Pyramid ALKNet for Semantic Parsing of Building Facade Image," IEEE Geoscience and Remote Sensing Letters, 2020.
[3] V. Lialin, R. Goel, A. Simanovsky, A. Rumshisky, and R. Shah, "Continual Learning for Neural Semantic Parsing," arXiv preprint arXiv:2010.07865, 2020
[4] D. C. Cireşan, U. Meier, and J. Schmidhuber, "Transfer learning for Latin and Chinese characters with deep neural networks," in The 2012 International Joint Conference on Neural Networks (IJCNN), 2012, pp. 1-6: IEEE.
[5] J. S. Ren and L. Xu, "On vectorization of deep convolutional neural networks for vision tasks," in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[6] T. Kaur and T. K. Gandhi, "Deep convolutional neural networks with transfer learning for automated brain image classification," Machine Vision and Applications, vol. 31, pp. 1-16, 2020.
[7] I. D. Apostolopoulos and T. A. Mpesiana, "Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks," Physical and Engineering Sciences in Medicine, p. 1, 2020.
[8] X. Li, Y. Grandvalet, and F. Davoine, "A baseline regularization scheme for transfer learning with convolutional neural networks," Pattern Recognition, vol. 98, p. 107049, 2020.
[9] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119.
[10] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, vol. 2, pp. 2169-2178: IEEE. [11] K. Chowdhary, "Natural language processing," in Fundamentals of Artificial Intelligence: Springer, 2020, pp. 603-649.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[13] D. Ciregan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," in 2012 IEEE conference on computer vision and pattern recognition, 2012, pp. 3642-3649: IEEE.
[14] O. Badmos, A. Kopp, T. Bernthaler, and G. Schneider, "Image-based defect detection in lithium-ion battery electrode using convolutional neural networks," Journal of Intelligent Manufacturing, vol. 31, no. 4, pp. 885-897, 2020.
[15] X. Gou, L. Qing, Y. Wang, M. Xin, and X. Wang, "Re-training and parameter sharing with the Hash trick for compressing convolutional neural networks," Applied Soft Computing, p. 106783, 2020.
[16] L. Deng, "A tutorial survey of architectures, algorithms, and applications for deep learning," APSIPA Transactions on Signal and Information Processing, vol. 3, 2014.
[17] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, "Deep learning for visual understanding: A review," Neurocomputing, vol. 187, pp. 27-48, 2016.
[18] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, "How to construct deep recurrent neural networks," arXiv preprint arXiv:1312.6026, 2013.
[19] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, pp. 533-536, 1986.
[20] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
[21] C. Y. Miao, A. Yang, and M. J. Anderson, "Deep Learning Workload Performance Auto-Optimizer," EasyChair2516-2314, 2020.
[22] R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska, "Bao: Learning to Steer Query Optimizers," arXiv preprint arXiv:2004.03814, 2020.
[23] G.-H. Liu, T. Chen, and E. A. Theodorou, "A Differential Game Theoretic Neural Optimizer for Training Residual Networks," arXiv preprint arXiv:2007.08880, 2020.
[24] I. Kandel, M. Castelli, and A. Popovič, "Comparative Study of First Order Optimizers for Image Classification Using Convolutional Neural Networks on Histopathology Images," Journal of Imaging, vol. 6, no. 9, p. 92, 2020
[25] S. Postalcıoğlu, "Performance Analysis of Different Optimizers for Deep Learning-Based Image Recognition," International Journal of Pattern Recognition and Artificial Intelligence, vol. 34, no. 02, p. 2051003, 2020.
[26] S. Kim and T.-S. Choi, "Design of Multichannel FIR Filter using Gradient Descent Optimizer for Personal Audio Systems," in Audio Engineering Society Convention 148, 2020: Audio Engineering Society.
[27] R. Sutton, "Two problems with back propagation and other steepest descent learning procedures for networks," in Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986, 1986, pp. 823-832.
[28] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol. 12, no. 1, pp. 145-151,1999.
[29] T. Dozat, "Incorporating nesterov momentum into adam.(2016),"
[30] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of machine learning research, vol. 12, no. 7, 2011.
[31] M. D. Zeiler, "Adadelta: an adaptive learning rate method," arXiv preprint arXiv:1212.5701, 2012.
[32] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[33] M. Kögel and R. Findeisen, "A fast gradient method for embedded linear predictive control," IFAC Proceedings Volumes, vol. 44, no. 1, pp. 1362-1367, 2011.
[34] L. Liu et al., "On the variance of the adaptive learning rate and beyond," arXiv preprint arXiv:1908.03265, 2019.
[35] P. Efraimidis and P. Spirakis, "Weighted Random Sampling," in Encyclopedia of Algorithms, M.-Y. Kao, Ed. Boston, MA: Springer US, 2008, pp. 1024-1027.
[36] K. He, X. Zhang, and S. Ren, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[37] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
[39] M. L. McHugh, "Interrater reliability: the kappa statistic," Biochemia medica: Biochemia medica, vol. 22, no. 3, pp. 276-282, 2012
[40] G. Beliakov, "Smoothing Lipschitz functions," Optimisation Methods and Software, vol. 22, no. 6, pp. 901-916, 2007.