Document Type : Original/Review Paper


Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran



The quality of the extracted features from a long-term sequence of raw prices of the instruments greatly affects the performance of the trading rules learned by machine learning models. Employing a neural encoder-decoder structure to extract informative features from complex input time-series has proved very effective in other popular tasks like neural machine translation and video captioning. In this paper, a novel end-to-end model based on the neural encoder-decoder framework combined with deep reinforcement learning is proposed to learn single instrument trading strategies from a long sequence of raw prices of the instrument. In addition, the effects of different structures for the encoder and various forms of the input sequences on the performance of the learned strategies are investigated. Experimental results showed that the proposed model outperforms other state-of-the-art models in highly dynamic environments.


Main Subjects

[1] E. P. Chan, Quantitative trading: how to build your own algorithmic trading business. John Wiley & Sons, 2021.
[2] P. Gomber and M. Haferkorn, “High frequency trading,” in Encyclopedia of Information Science and Technology, Third Edition, IGI Global, 2015, pp. 1–9.
[3] Z. Zhang, S. Zohren, and S. Roberts, “Deep reinforcement learning for trading,” J. Financ. Data Sci., vol. 2, no. 2, pp. 25–40, 2020.
[4] P. Ganesh and P. Rakheja, “VLSTM: Very Long Short-Term Memory Networks for High-Frequency Trading,” arXiv Prepr. arXiv1809.01506, 2018.
[5] A. Arévalo, J. Niño, G. Hernández, and J. Sandoval, “High-frequency trading strategy based on deep neural networks,” in International conference on intelligent computing, 2016, pp. 424–436.
[6] M. F. Dixon, N. G. Polson, and V. O. Sokolov, “Deep learning for spatio‐temporal modeling: dynamic traffic flows and high frequency trading,” Appl. Stoch. Model. Bus. Ind., vol. 35, no. 3, pp. 788–807, 2019.
[7] M. Taghian, A. Asadi, and R. Safabakhsh, “Learning financial asset-specific trading rules via deep reinforcement learning,” Expert Syst. Appl., p. 116523, 2022.
[8] J. Moody, L. Wu, Y. Liao, and M. Saffell, “Performance functions and reinforcement learning for trading systems and portfolios,” J. Forecast., vol. 17, no. 5‐6, pp. 441–470, 1998.
[9] A. Suchaimanacharoen, T. Kasetkasem, S. Marukatat, I. Kumazawa, and P. Chavalit, “Empowered pg in forex trading,” in 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2020, pp. 316–319.
[10] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[11] S. Luo, X. Lin, and Z. Zheng, “A novel CNN-DDPG based AI-trader: Performance and roles in business operations,” Transp. Res. Part E Logist. Transp. Rev., vol. 131, pp. 68–79, 2019.
[12] J. Wang, Y. Zhang, K. Tang, J. Wu, and Z. Xiong, “Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 1900–1908.
[13] Z. Xiong, X.-Y. Liu, S. Zhong, H. Yang, and A. Walid, “Practical deep reinforcement learning approach for stock trading,” arXiv Prepr. arXiv1811.07522, 2018.
[14] J. B. Chakole, M. S. Kolhe, G. D. Mahapurush, A. Yadav, and M. P. Kurhekar, “A Q-learning agent for automated trading in equity stock markets,” Expert Syst. Appl., vol. 163, p. 113761, 2021.
[15] T. Théate and D. Ernst, “An application of deep reinforcement learning to algorithmic trading,” Expert Syst. Appl., vol. 173, p. 114632, 2021.
[16] A. Brim, “Deep reinforcement learning pairs trading with a double deep Q-network,” in 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), 2020, pp. 222–227.
[17] X. Wu, H. Chen, J. Wang, L. Troiano, V. Loia, and H. Fujita, “Adaptive stock trading strategies with deep reinforcement learning methods,” Inf. Sci. (Ny)., vol. 538, pp. 142–158, 2020.
[18] T. N. Rollinger and S. T. Hoffman, “Sortino: a ‘sharper’ratio,” Chicago, Illinois Red Rock Cap., 2013.
[19] L. Weng, X. Sun, M. Xia, J. Liu, and Y. Xu, “Portfolio trading system of digital currencies: A deep reinforcement learning with multidimensional attention gating mechanism,” Neurocomputing, vol. 402, pp. 171–182, 2020.
[20] A. Asadi and R. Safabakhsh, “The encoder-decoder framework and its applications,” in Deep Learning: Concepts and Architectures, Springer, 2020, pp. 133–167.
[21] A. Northcott, The complete guide to using candlestick charting: How to earn high rates of return-safely. Atlantic Publishing Company, 2009.
[22] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv Prepr. arXiv1406.1078, 2014.
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, Vol. 86, No. 11, pp. 2278–2324, 1998.
[24] W. F. Sharpe, “The sharpe ratio,” Streetwise–the Best J. Portf. Manag., pp. 169–185, 1998.
[25] Y. Wang, D. Wang, S. Zhang, Y. Feng, S. Li, and Q. Zhou, “Deep Q-trading,” cslt. riit. tsinghua. edu. cn, 2017.