Employing Chaos Theory for Exploration-Exploitation Balance in Reinforcement Learning

Khodadadi, Habib; Derhami, Vali

doi:10.22044/jadm.2025.15292.2633

Document Type : Original/Review Paper

Authors

Habib Khodadadi ¹
Vali Derhami ²

¹ Department of Computer Engineering, Minab Branch, Islamic Azad University, Minab, Iran.

² Computer Engineering Department, Yazd University, Yazd, Iran.

https://doi.org/10.22044/jadm.2025.15292.2633

Abstract

The exploration-exploitation trade-off poses a significant challenge in reinforcement learning. For this reason, action selection methods such as ε-greedy and Soft-Max approaches are used instead of the greedy method. These methods use random numbers to select an action that balances exploration and exploitation. Chaos is commonly utilized across various scientific disciplines because of its features, including non-periodicity, unpredictability, ergodicity and pseudorandom behavior. In this paper, we employ numbers generated by different chaotic systems to select action and identify better maps in diverse states and quantities of actions. Based on our experiments on various environments such as the Multi-Armed Bandit (MAB), taxi-domain, and cliff-walking, we found that many of the chaotic methods increase the speed of learning and achieve higher rewards.

Keywords

Main Subjects

H.3. Artificial Intelligence

References

[1] V. Derhami, F. Alamian Harandi and M. B. Dowlatshahi, Reinforcement Learning. Yazd, Iran, Yazd University Press, 2017.

[2] F. Alamiyan-Harandi, V. Derhami and F. Jamshidi, “A new framework for mobile robot trajectory tracking using depth data and learning algorithms”, Journal of Intelligent & Fuzzy Systems, vol. 34, no. 6, pp. 3969-3982, 2018.

[3] RS. Sutton and AG. Barto, Reinforcement learning: An introduction. 2nd Ed, London, The MIT Press, 2018.

[4] BH. Abed-alguni, “Action-selection method for reinforcement learning based on cuckoo search algorithm”, Arabian Journal for Science and Engineering, vol. 43, no. 12, pp. 6771-6785, 2018.

[5] K. Morihiro, T. Isokawa, N. Matsui and H. Nishimura, “Effects of chaotic exploration on reinforcement learning in target capturing task”, International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 12, no. 5-6, pp. 369-377, 2008.

[6] K. Morihiro, T. Isokawa, N. Matsui and H. Nishimura, “Reinforcement learning by chaotic exploration generator in target capturing task”, proc. International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer Berlin Heidelberg, 2005, pp. 1248-1254.

[7] K. Morihiro, N. Matsui and H. Nishimura, “Effects of chaotic exploration on reinforcement maze learning”, Proc. International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer Berlin Heidelberg, 2004, pp. 833-839.

[8] K. Morihiro, N. Matsui and H. Nishimura, “Chaotic exploration effects on reinforcement learning in shortcut maze task”, International Journal of Bifurcation and Chaos, vol. 16, no. 10, pp. 3015-3022, 2006.

[9] AB. Potapov and MK. Ali, “Learning, exploration and chaotic policies”, International Journal of Modern Physics C, vol. 11, no.07, pp. 1455-1464, 2000.

[10] E. Pei, J. Jiang, L. Liu, Y. Li and Z. Zhang, “A chaotic Q-learning-based licensed assisted access scheme over the unlicensed spectrum”, IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 9951-9962, 2019.

[11] B. Zarei and MR. Meybodi, “Improving learning ability of learning automata using chaos theory”, The Journal of Supercomputing, vol. 77, no. 1, pp. 652-678, 2021.

[12] EN. Lorenz, “Deterministic nonperiodic flow”, Journal of atmospheric sciences, vol. 20, no. 2, pp. 130-141, 1963.

[13] G. Chen and T. Ueta, “Yet another chaotic attractor”, International Journal of Bifurcation and chaos, vol. 9, no. 07, pp. 1465-1466, 1999.

[14] H. Khodadadi and V. Derhami, “Improving Speed and Efficiency of Dynamic Programming Methods through Chaos”, Journal of AI and Data Mining, vol. 9, no. 4, pp. 487-496, 2021.

[15] M. Mollaeefar, A. Sharif and M. Nazari, “A novel encryption scheme for colored image based on high level chaotic maps”, Multimedia Tools and Applications, vol. 76, pp. 607-629, 2017.

[16] RY. Chen, J. Schulman, P. Abbeel and S. Sidor, “UCB and infogain exploration via q-ensembles”, arXiv:1706.01502, 2017.

[17] M. Tokic, “Adaptive ε-greedy exploration in reinforcement learning based on value differences”, proc. Annual Conference on Artificial Intelligence, Springer Berlin Heidelberg, 2010, pp. 203-210.

[18] M. Tokic and G. Palm, “Value-difference based exploration: adaptive control between epsilon-greedy and softmax”. proc. Annual conference on artificial intelligence, Berlin, 2011, pp. 335-346.

[19] V. Derhami, V. Johari Majd, MN. Ahmadabadi, “Exploration and exploitation balance management in fuzzy reinforcement learning”, Fuzzy sets and systems, vol. 161, no. 4, pp. 578-595, 2010.

[20] YL. He, XL. Zhang, W. Ao and JZ. Huang, “Determining the optimal temperature parameter for Softmax function in reinforcement learning”, Applied Soft Computing, vol. 70, pp.80-85, 2018.

[21] M. Guo, Y. Liu and J. Malec, “A new Q-learning algorithm based on the metropolis criterion”, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 34, no. 5, pp. 2140-2143, 2004.

[22] C. Chen, D. Dong, HX. Li, J. Chu and TJ. Tarn, “Fidelity-based probabilistic Q-learning for control of quantum systems”, IEEE transactions on neural networks and learning systems, vol. 25, no. 5, pp. 920-933, 2013.

[23] RA. Bianchi, CH. Ribeiro CH and AHR. Costa, “Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results”, proc. 20th European Conference on Artificial Intelligence (ECAI), IOS Press, 2012, pp. 169-174.

[24] A. Ecoffet, J. Huizinga, J. Lehman, K.O. Stanley and J. Clune, “First return, then explore”, Nature, vol. 590, no.7847, pp. 580-586, 2021.

[25] T. Lin and A. Jabri, “MIMEx: intrinsic rewards from masked input modeling”, arXiv preprint arXiv:2305.08932, 2023.

[26] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, “OpenAI Gym”, ArXiv:1606.01540, 2016.

[27] Z. Hua and Y. Zhou, “Exponential chaotic model for generating robust chaos”, IEEE transactions on systems, man, and cybernetics, vol. 51, no. 6, pp. 3713-3724, 2019.

[28] AH. Gandomi and XS. Yang,” Chaotic bat algorithm”, Journal of computational science, vol. 5, no. 2, pp. 224-232, 2014.

[29] Jr I. Fister, M. Perc, SM. Kamal and I. Fister, “A review of chaos-based firefly algorithms: perspectives and research challenges”, Applied Mathematics and Computation, vol. 252, pp. 155-165, 2015.

[30] H. Lu, X. Wang, Z. Fei and M. Qiu, “The effects of using chaotic map on improving the performance of multi objective evolutionary algorithms”, Mathematical Problems in Engineering, no. 1, Article ID 924652, 2014.

[31] X. Zhang and Y. Cao, “A novel chaotic map and an improved chaos-based image encryption scheme”, The Scientific World Journal, no. 1, Article ID 713541, 2014.

[32] C. Zhu, “A novel image encryption scheme based on improved hyperchaotic sequences”, Optics communications, vol. 285, no. 1, pp. 29-37, 2012.

[33] A. Rezaee Jordehi, “A chaotic artificial immune system optimization algorithm for solving global continuous optimization problems”, Neural Computing and Applications, vol. 26, pp. 827-833, 2015.

[34] PP. Singh, “A chaotic system with large Lyapunov exponent: Nonlinear observer design and circuit implementation”, In 2020 3rd international conference on energy, power and environment: Towards clean energy technologies, 2021, pp. 1-6.

[35] N. Nguyen, L. Pham-Nguyen, MB. Nguyen and G.

Kaddoum, “A low power circuit design for chaos-key based data encryption”, IEEE Access, vol. 8, pp. 104432-104444, 2020.

[36] KZ. Zamli, F. Din, HS. Alhadawi, “Exploring a Q-learning-based chaotic naked mole rat algorithm for S-box construction and optimization”, Neural Computing and Applications, vol. 35, no. 14, pp. 10449-10471, 2023.

[37] L. Moysis, A. Tutueva, C. Volos, D. Butusov, JM. Munoz-Pacheco, H. Nistazakis, “A two-parameter modified logistic map and its application to random bit generation”, Symmetry, vol. 12, no. 5: 829, 2020.

[38] L. Skanderova, I. Zelinka, “Arnold cat map and sinai as chaotic numbers generators in evolutionary algorithms”, In AETA 2013: Recent Advances in Electrical Engineering and Related Sciences, 2014, pp. 381-389.

Employing Chaos Theory for Exploration-Exploitation Balance in Reinforcement Learning

References

References

Volume 13, Issue 2April 2025Pages 145-157

Volume 13, Issue 2
April 2025
Pages 145-157