This paper presents an improved framework for deep reinforcement learning algorithms integrating online system identification, based on the Dyna-Q architecture. The proposed framework is designed to tackle the challenges of both Multi-Input Multi-Output and Multi-Input Single-Output systems in complex, industry-relevant environments, thereby significantly enhancing adaptability and reliability in industrial control systems. It should be noted that in the suggested novel framework, the system identification and model control processes run in parallel with the control process, ensuring a reliable backup in case of faults or disruptions. To verify the efficiency of the aforementioned approach, comparative evaluations in the presence of three of the most common deep reinforcement learning algorithms, i.e. Deep Q Network, Deep Deterministic Policy Gradient, and Twin Delayed Deep Deterministic Policy Gradient, are conducted on industry-relevant environments simulations available in OpenAI Gym, including the Cart Pole, Pendulum, and Bipedal Walker, each chosen to reflect specific aspects of the novel framework. Results demonstrate that the proposed method for leveraging both real and simulated experiences in this framework improves sample efficiency, stability, and robustness.
H. Ang, G. Chong, Y. Li, PID control system analysis, design, and technology, IEEE Transactions on Control Systems Technology, 13(4) (2005) 559–576.
Morari, J.H. Lee, Model predictive control: past, present and future, Computers & Chemical Engineering, 23(4–5) (1999) 667–682.
Kuhnle, J.-P. Kaiser, F. Theiß, N. Stricker, G. Lanza, Designing an adaptive production control system using reinforcement learning, Journal of Intelligent Manufacturing, 32 (2021) 855–876.
Lee, S. Koo, I. Jang, J. Kim, Comparison of deep reinforcement learning and PID controllers for automatic cold shutdown operation, Energies, 15(8) (2022) 2834.
Wang, T. Hong, Reinforcement learning for building controls: The opportunities and challenges, Applied Energy, 269 (2020) 115036.
Tao, D. Zhang, W. Ma, X. Liu, D. Xu, Automatic metallic surface defect detection and recognition with convolutional neural networks, Applied Sciences, 8(9) (2018) 1575.
J. Antsaklis, A. Rahnama, Control and machine intelligence for system autonomy, Journal of Intelligent & Robotic Systems, 91 (2018) 23–34.
F. Arinez, Q. Chang, R.X. Gao, C. Xu, J. Zhang, Artificial intelligence in advanced manufacturing: Current status and future outlook, Journal of Manufacturing Science and Engineering, 142(11) (2020) 110804.
Spielberga, A. Tulsyana, N.P. Lawrenceb, P.D. Loewenb, R.B. Gopalunia, Deep reinforcement learning for process control: A primer for beginners, Journal, 65(10) (2019).
-J. Park, S.-K.S. Fan, C.-Y. Hsu, A review on fault detection and process diagnostics in industrial processes, Processes, 8(9) (2020) 1123.
-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, Y. Yu, A survey on model-based reinforcement learning, Science China Information Sciences, 67(2) (2024) 121101.
S. Sutton, Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Machine Learning Proceedings 1990, Elsevier, 1990, pp. 216–224.
S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin, 2(4) (1991) 160–163.
Vitolo, A. San Miguel, J. Civera, C. Mahulea, Performance evaluation of the Dyna-Q algorithm for robot navigation, in: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), IEEE, 2018, pp. 322–327.
P. Singh, Reinforcement learning with a hierarchy of abstract models, in: Proceedings of the National Conference on Artificial Intelligence, Citeseer, 1992, p. 202.
Peng, X. Li, J. Gao, J. Liu, K.-F. Wong, Deep Dyna-Q: Integrating planning for task-completion dialogue policy learning, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2182–2192.
Z. Holland, The effect of planning shape on Dyna-style planning in high-dimensional state spaces, PhD diss., University of Alberta, 2018.
Liu, Y. Yao, T. Li, M. Du, X. Wang, H. Li, M. Li, Dyna algorithm-based reinforcement learning energy management for fuel cell hybrid engineering vehicles, Journal of Energy Storage, 94 (2024) 112526.
Budiyanto, K. Azetsu, K. Miyazaki, N. Matsunaga, On fast learning of cooperative transport by multi-robots using DeepDyna-Q, in: 2022 61st Annual Conference of the Society of Instrument and Control Engineers (SICE), IEEE, 2022, pp. 1058–1062.
Kalweit, J. Boedecker, Uncertainty-driven imagination for continuous deep reinforcement learning, in: Conference on Robot Learning, PMLR, 2017, pp. 195–206.
Kulhánek, E. Derner, T. de Bruin and R. Babuška, Vision-based navigation using deep reinforcement learning, in: 2019 European Conference on Mobile Robots (ECMR), Prague, Czech Republic, 2019, pp. 1–8.
Hafner, J. Pasukonis, J. Ba, T. Lillicrap, Mastering diverse control tasks through world models, Nature, 640 (2025) 647–653.
Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, J. Davidson, Learning latent dynamics for planning from pixels, in: International Conference on Machine Learning, PMLR, 2019, pp. 2555–2565.
Janner, J. Fu, M. Zhang, S. Levine, When to trust your model: Model-based policy optimization, In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 12519–12530.
Fujimoto, H. Hoof, D. Meger, Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning, PMLR, 2018, pp. 1587–1596.
S. Sutton, A.G. Barto, Reinforcement learning: An introduction, MIT Press, Cambridge, MA, 2018.
Szepesvari, Algorithms for reinforcement learning, Morgan & Claypool Publishers, Switzerland, 2010.
-t. Liu, J.-m. Yang, L. Chen, T. Guo, Y. Jiang, Overview of reinforcement learning based on value and policy, in: 2020 Chinese Control and Decision Conference (CCDC), IEEE, 2020, pp. 598–603.
J. Watkins, P. Dayan, Q-learning, Machine Learning, 8 (1992) 279–292.
S. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximately, Advances in Neural Information Processing Systems, 12 (2000) 1057–1063.
Bennett, Y. Niv, A.J. Langdon, Value-free reinforcement learning: Policy optimization as a minimal model of operant behavior, Current Opinion in Behavioral Sciences, 41 (2021) 114–121.
Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518(7540) (2015) 529–533.
Gobinathan, R. Ponnusamy, Deep-Q-based reinforcement learning method to predict accuracy of Atari gaming set classification, in: 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 2023, pp. 1–4.
Bellman, Dynamic Programming, 1st Ed., Princeton University Press, Princeton, NJ, USA, 1957.
Tan, Reinforcement learning with deep deterministic policy gradient, in: 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA), Xi'an, China, 2021, pp. 82–85.
Y. Choi, A.S. Coyner, J. Kalpathy-Cramer, M.F. Chiang, J.P. Campbell, Introduction to machine learning, neural networks, and deep learning, Translational Vision Science & Technology, 9(2) (2020) 14.
Terven, D.-M. Cordova-Esparza, J.-A. Romero-González, A. Ramírez-Pedraza, E.A. Chávez-Urbiola, A comprehensive survey of loss functions and metrics in deep learning, Artificial Intelligence Review, 58(7) (2025) 195.
Barekatain, M. and Sayyaf, N. (2025). Enhancing the Reliability of Control Systems Using an Improved Deep Reinforcement Learning Framework. AUT Journal of Mechanical Engineering, 9(4), 357-372. doi: 10.22060/ajme.2025.24021.6172
MLA
Barekatain, M. , and Sayyaf, N. . "Enhancing the Reliability of Control Systems Using an Improved Deep Reinforcement Learning Framework", AUT Journal of Mechanical Engineering, 9, 4, 2025, 357-372. doi: 10.22060/ajme.2025.24021.6172
HARVARD
Barekatain, M., Sayyaf, N. (2025). 'Enhancing the Reliability of Control Systems Using an Improved Deep Reinforcement Learning Framework', AUT Journal of Mechanical Engineering, 9(4), pp. 357-372. doi: 10.22060/ajme.2025.24021.6172
CHICAGO
M. Barekatain and N. Sayyaf, "Enhancing the Reliability of Control Systems Using an Improved Deep Reinforcement Learning Framework," AUT Journal of Mechanical Engineering, 9 4 (2025): 357-372, doi: 10.22060/ajme.2025.24021.6172
VANCOUVER
Barekatain, M., Sayyaf, N. Enhancing the Reliability of Control Systems Using an Improved Deep Reinforcement Learning Framework. AUT Journal of Mechanical Engineering, 2025; 9(4): 357-372. doi: 10.22060/ajme.2025.24021.6172