H.3. Artificial Intelligence
Habib Khodadadi; Vali Derhami
Abstract
The exploration-exploitation trade-off poses a significant challenge in reinforcement learning. For this reason, action selection methods such as ε-greedy and Soft-Max approaches are used instead of the greedy method. These methods use random numbers to select an action that balances exploration ...
Read More
The exploration-exploitation trade-off poses a significant challenge in reinforcement learning. For this reason, action selection methods such as ε-greedy and Soft-Max approaches are used instead of the greedy method. These methods use random numbers to select an action that balances exploration and exploitation. Chaos is commonly utilized across various scientific disciplines because of its features, including non-periodicity, unpredictability, ergodicity and pseudorandom behavior. In this paper, we employ numbers generated by different chaotic systems to select action and identify better maps in diverse states and quantities of actions. Based on our experiments on various environments such as the Multi-Armed Bandit (MAB), taxi-domain, and cliff-walking, we found that many of the chaotic methods increase the speed of learning and achieve higher rewards.