Document Type : Original/Review Paper

Authors

Computer Engineering Department, Yazd University, Yazd, Iran.

Abstract

A prominent weakness of dynamic programming methods is that they perform operations throughout the entire set of states in a Markov decision process in every updating phase. This paper proposes a novel chaos-based method to solve the problem. For this purpose, a chaotic system is first initialized, and the resultant numbers are mapped onto the environment states through initial processing. In each traverse of the policy iteration method, policy evaluation is performed only once, and only a few states are updated. These states are proposed by the chaos system. In this method, the policy evaluation and improvement cycle lasts until an optimal policy is formulated in the environment. The same procedure is performed in the value iteration method, and only the values of a few states proposed by the chaos are updated in each traverse, whereas the values of other states are left unchanged. Unlike the conventional methods, an optimal solution can be obtained in the proposed method by only updating a limited number of states which are properly distributed all over the environment by chaos. The test results indicate the improved speed and efficiency of chaotic dynamic programming methods in obtaining the optimal solution in different grid environments.

Keywords

 [1] V. Derhami, F. Alamian Harandi, and M.B. Dowlatshahi, Reinforcement Learning. Yazd, Iran, Yazd University Press, 2017.
[2] A.G. Barto, S.J. Bradtke, and S.P. Singh, "Learning to act using real-time dynamic programming", Artificial Intelligence, Vol. 72(1), pp. 81–138, 1995.
[3] B. Bonet and H. Geffner, "Labeled RTDP: Improving the Convergence of Real-time Dynamic Programming", In Proc. 13th International Conference on Automated Planning and Scheduling (ICAPS-03), 2003, pp. 12–21.
[4] H.B. McMahan, M. Likhachev, and G.J. Gordon, "Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees", In Proceedings of the 22nd International Conference on Machine Learning, New York, 2005, pp. 569-576.
[5] S. Sanner, R. Goetschalckx, K. Driessens, and G. Shani, "Bayesian real-time dynamic programming", In Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, 2009, pp. 1784-1789.
[6] S. Schmoll and M. Schubert. "Dynamic resource routing using real-time dynamic programming", Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden,  2018, pp. 4822-4828.
[7] P. Dai, M.D.S. Weld, and J. Goldsmith, "Topological value iteration algorithms", journal of Artificial Intelligence, Vol. 42(1), pp. 181–209, 2011.
[8] W. Reis, L. Barros, and K. Delgado, "Robust topological policy iteration for infinite horizon bounded Markov Decision Processes", International Journal of Approximate Reasoning, Vol. 105, pp. 287–304, 2019.
[9] D. Wingate and K. D. Seppi, "Prioritization methods for accelerating MDP solvers", In Journal of Machine Learning Research, Vol. 6, pp. 851-881, 2005.
[10] A. Khademi, "A Novel Method for Improving Value Iteration in Dense Markov Decision Process", M.S. thesis, Dept. Computer., Yazd Univ., Yazd, 2016.
[11] E.A. Hansen and S. Zilberstein, "Lao*: A heuristic search algorithm that finds solutions with loops", Artificial Intelligence, 129(1-2), pp. 35–62, 2001.
[12] B. Bonet and G. Hector, "Action selection for MDPs: Anytime AO* vs. UCT", In AAAI Conference on Artificial Intelligence, 2012, Toronto, Ontario, Canada, pp. 1749-1755.
[13] H. Khodadadi and A. Zandvakili, "A New Method for Encryption of Color Images based on Combination of Chaotic Systems", Journal of Ai and Data Mining, Vol. 7(3), pp. 377-383, 2019.
[14] H. Khodadadi and O. Mirzaei, "A stack-based chaotic algorithm for encryption of colored images", Journal of AI and Data Mining, Vol. 5(1), pp. 29-37, 2017.
[15] X. Chai, Y. Chen, and L. Broyde, "A novel chaos-based image encryption algorithm using DNA sequence operations", Optics and Lasers in Engineering, Vol. 88, pp. 197-213, 2017.
[16] J.S.A.E. Fouda, J.Y. Effa, S.L. Sabat, and M. Ali, "A fast chaotic block cipher for image encryption", Communications in Nonlinear Science and Numerical Simulation, Vol. 19(3), pp. 578-588, 2014.
[17] E.N. Lorenz, "Deterministic Non-periodic Flow", Journal of the Atmospheric Sciences, Vol. 20(2), pp.130-141, 1963.
[18] G. Chen and T. Ueta, "Yet another chaotic attractor", International Journal of Bifurcation and Chaos, Vol. 9(7), pp. 1465-1466, 1999.
[19] A. Kanso and N. Smaoui, "Logistic chaotic maps for binary numbers generations", Chaos, Solitons and Fractals, Vol. 40(5), pp. 2557–2568, 2009.
[20] B. Youngchul, "Target Searching Method in the Chaotic Mobile Robot", In the 23rd Digital Avionics Systems Conference (IEEE Cat. No.04CH37576), Salt Lake City, UT, USA, 2004.
[21] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. Cambridge, MA: MIT Press, 2005.
[22] S. Debnath, L. Liu, and G. Sukhatme, "Reachability and differential based heuristics for solving Markov decision processes", The 18th International Symposium on Robotics Research (ISRR), Puerto Varas, Chile, 2017.
[23] N.K. Pareek, V. Patidar, and K.K. Sud, "Image encryption using chaotic logistic map", Image and Vision Computing, Vol. 24(9), pp. 926-934, 2006.
[24] L. Wang and H. Cheng, "Pseudo-Random Number Generator Based on Logistic Chaotic System", Entropy, Vol. 21(10), 2019.