This paper proposes an Improved Model-Based Deep Deterministic Policy Gradient, a novel reinforcement learning algorithm designed to overcome three critical challenges in industrial deep reinforcement learning applications: (1) poor sample efficiency requiring excessive real-world trials, (2) safety risks from unstable policies during training, and (3) difficulty scaling to high-dimensional continuous control spaces. Building on DDPG's strengths for continuous control, the proposed algorithm introduces four key innovations: (i) a virtual environment for data-efficient learning, (ii) a simulation rate mechanism adapting model reliance dynamically, (iii) a simulated experience buffer preventing divergence, and (iv) a performance threshold for fail-safe operation. Evaluated on the Cart-Pole benchmark via the OpenAI Gym Python library, the suggested method demonstrates faster convergence than standard DDPG while maintaining performance degradation under sensor malfunctions or communication losses. These improvements derive from the algorithm's unique ability to simultaneously leverage real-world data and model-generated experiences, reducing physical trial costs while ensuring operational safety. The results establish the novel framework as a practical solution for industrial control systems where reliability and data efficiency are paramount, particularly in applications like chemical process control and precision robotics that demand stable operation amid sensor/communication failures.
H. Ang, G. Chong, Y. Li, PID control system analysis, design, and technology, IEEE Transactions on Control Systems Technology, 13(4) (2005) 559–576.
Samad, A. M. Annaswamy, The impact of control technology: Overview, success stories, and research challenges, IEEE Control Systems Magazine, 40(6) (2020) 36–70.
Ogata, Modern Control Engineering, 5th ed., Prentice Hall, 2010.
Kuhnle, J. P. Kaiser, F. Theiß, N. Stricker, G. Lanza, Designing an adaptive production control system using reinforcement learning, Journal of Intelligent Manufacturing, 32 (2021) 855–876.
Panda, P. K. Patra, Nonlinear control and estimation of industrial pneumatic actuator systems: A survey, ISA Transactions, 109 (2021) 177–193.
Su, H. Liu, Nonlinear control of robotic manipulators using adaptive neural network approaches: A review, IEEE/CAA Journal of Automatica Sinica, 8(4), (2021) 678–694.
Liu, Y. Jiang, Q. Zhang, Intelligent process monitoring and control of machining systems using data-driven techniques: A review, Journal of Manufacturing Systems, 56 (2020) 188–206.
F. Camacho, C. Bordons, Model Predictive Control, Springer Science & Business Media, Springer London, 2013.
J. Åström, R. M. Murray, Feedback Systems: An Introduction for Scientists and Engineers, Princeton University Press, Princeton, New Jersey, 2010.
Lee, S. Koo, I. Jang, J. Kim, Comparison of deep reinforcement learning and PID controllers for automatic cold shutdown operation, Energies, 15(8) (2022) 2834.
Wang, T. Hong, Reinforcement learning for building controls: The opportunities and challenges, Applied Energy, 269 (2020) 115036.
Tao, D. Zhang, W. Ma, X. Liu, D. Xu, Automatic metallic surface defect detection and recognition with convolutional neural networks, Applied Sciences, 8(9) (2018) 1575.
J. Antsaklis, A. Rahnama, Control and machine intelligence for system autonomy, Journal of Intelligent & Robotic Systems, 91 (2018) 23–34.
F. Arinez, Q. Chang, R. X. Gao, C. Xu, J. Zhang: Artificial intelligence in advanced manufacturing: Current status and future outlook, Journal of Manufacturing Science and Engineering, 142(11) (2020) 110804.
Hussain, H. A. Gabbar, M. R. Khan, Digital twin-based smart monitoring and control of petrochemical processes using AI and IoT, IEEE Access, 9 (2021) 141128–141145.
M. Bianchi, L. Livi, C. Alippi, Predictive maintenance for industrial IoT of things: A deep learning approach with attention-based RNNs, IEEE Transactions on Industrial Informatics, 17(9) (2021) 6204–6212.
A. Rummery, M. Niranjan, On-Line Q-Learning Using Connectionist Systems, Department of Engineering, University of Cambridge, Cambridge, 1994.
S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed., MIT Press, Cambridge, 2018.
LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436-444.
Lyu, Y. Tian, R. Zhao, S. Yin, Deep reinforcement learning for wind turbine control: Challenges and opportunities, Renewable and Sustainable Energy Reviews, 144 (2021) 110948.
Yu, Z. Zhou, Y. Liu, C. Li, Y. Liu, Reinforcement learning in industrial applications: Recent advances and prospects, Engineering Applications of Artificial Intelligence, 105 (2021) 104398.
R. Vázquez-Canteli, Z. Nagy, Reinforcement learning for demand response: A review of algorithms and modeling techniques, Applied Energy, 276 (2020) 115446.
Kiumarsi, H. Modares, F. L. Lewis, A. Karimpour, A. Davoudi, Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems, 29(6) (2018) 2042–2062.
Szepesvari, Algorithms for Reinforcement Learning, Morgan & Claypool Publishers, Switzerland, 2010.
R. Kiran,I. Sobh, V. Talpaert, P. Mannion, A. A. Sallab, S. Yogamani, Deep reinforcement learning for autonomous driving: A Survey, IEEE Transactions on Intelligent Transportation Systems, 23 (6) (2021) 4909–4926.
Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of Go with deep neural networks and tree search, Nature, 529 (2016) 484–489.
Riedmiller, Neural fitted Q-iteration – First experiences with a data efficient neural reinforcement learning method, in: Proceedings of the 16th European Conference on Machine Learning (ECML), Porto, Portugal, 2005, pp. 317–328.
Lin, Y. Liu, F. Lin, L. Zou, P. Wu, W. Zeng, H. Chen, Ch. Miao, A survey on reinforcement learning for recommender systems, IEEE Transactions on Neural Networks and Learning Systems, 35(10) (2024) 13164 - 13184.
J. Park, S. K. S. Fan, C. Y. Hsu, A review on fault detection and process diagnostics in industrial processes, Processes, 8(9) (2020) 1123.
A. Gupta, M. Y. Chow, Networked control system: Overview and research trends, IEEE Transactions on Industrial Electronics, 57(7) (2010) 2527–2535.
Garcia, F. Fernandez, A comprehensive survey on safe reinforcement learning, Journal of Machine Learning Research, 16 (2015) 1437–1480.
P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning, in: International Conference on Learning Representations (ICLR 2016), San Juan, 2016.
Bellman, A Markovian decision process, Journal of Mathematics and Mechanics 6(5) (1957) 679–684.
Morales, Grokking Deep Reinforcement Learning, Manning Publications, Shelter Iland, 2020.
S. Sutton, Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, in: Proceedings of the Seventh International Conference on Machine Learning, Austin, Texas, 1990, pp. 216–224.
S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM 2(4) (1991) 160–163.
Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, D. Hassabis, A general reinforcement learning algorithm that Masters Chess, Shogi, and Go through self-play, Science 362(6419) (2018) 1140–1144.
Watkins, Learning from Delayed Rewards, PhD thesis, University of Cambridge, England, 1989.
V. Hasselt, Double Q-learning, Advances in Neural Information Processing Systems 23 (2010) 2613–2621.
J. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning, 8(3-4) (1992) 293–321.
O’Neill, B. Pleydell-Bouverie, D. Dupret, J. Csicsvari, Play it again: Reactivation of waking experiences and memory, Trends in Neurosciences, 33(5) (2010) 220–229.
Zhang, R. Li, Q-value-based experience replay in reinforcement learning, Knowledge-Based Systems 315(2025) 113296.
Manih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature 518(7540) (2015) 529–533.
R. Konda, J. N. Tsitsiklis, On actor critic algorithms, SIAM Journal on Control and Optimization 42(4) (2003) 1143–1166.
Bhatnagar, R. S. Sutton, M. Ghavamzadeh, M. Lee, Natural Actor Critic Algorithm, Automatica 45(11) (2009) 2471–2482.
Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic policy gradient algorithms, in: Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1), 2014, PP. 387–395.
Fujimoto, H. Hoof, D. Meger, Addressing function approximation error in actor-critic methods, in: International Conference on Machine Learning (ICML), 2018, pp. 1582–1591.
Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, D. Meger, Deep reinforcement learning that matters, in: AAAI Conference on Artificial Intelligence, 2018, pp. 3207–3214.
Zhang, N. Ballas, J. Pineau, A dissection of overfitting and generalization in continuous control, ArXiv Preprint, arXiv:1806.07937 (2018).
Fujimoto, H. Hoof, D. Meger, Addressing function approximation error in actor critic methods, in: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018, pp. 1587–1596.
Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018, pp. 1861–1870.
Gu, Y. Cheng, C. L. P. Chen, X. Wang, Proximal policy optimization with policy feedback, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7) (2022) 4600–4610.
Doostmohammadian, M. I. Qureshi, M. H. Khalesi, H. R. Rabiee, U. A. Khan, Log-scale quantization in distributed first-order methods: Gradient-based learning from distributed data, IEEE Transactions on Automation Science and Engineering 22 (2025) 10948–10959.
Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, OpenAI Gym, ArXiv Preprint, arXiv: 1606.01540 (2016).
Lei, Q. Zhu, R. Li, Cascaded robust fixed-time terminal sliding mode control for uncertain cartpole systems with incremental nonlinear dynamic inversion, International Journal of Non-Linear Mechanics 167 (2024) 104900.
V. Florian, Correct Equations for the Dynamics of the Cart‑Pole System, Center for Cognitive and Neural Studies (Coneural), Romania, 2007.
Dolati, M. and Sayyaf, N. (2026). Towards Reliable Deep Reinforcement Learning for Industrial Applications: A DDPG-based Algorithm with Improved Performance. AUT Journal of Mechanical Engineering, 10(1), 61-74. doi: 10.22060/ajme.2025.24180.6181
MLA
Dolati, M. , and Sayyaf, N. . "Towards Reliable Deep Reinforcement Learning for Industrial Applications: A DDPG-based Algorithm with Improved Performance", AUT Journal of Mechanical Engineering, 10, 1, 2026, 61-74. doi: 10.22060/ajme.2025.24180.6181
HARVARD
Dolati, M., Sayyaf, N. (2026). 'Towards Reliable Deep Reinforcement Learning for Industrial Applications: A DDPG-based Algorithm with Improved Performance', AUT Journal of Mechanical Engineering, 10(1), pp. 61-74. doi: 10.22060/ajme.2025.24180.6181
CHICAGO
M. Dolati and N. Sayyaf, "Towards Reliable Deep Reinforcement Learning for Industrial Applications: A DDPG-based Algorithm with Improved Performance," AUT Journal of Mechanical Engineering, 10 1 (2026): 61-74, doi: 10.22060/ajme.2025.24180.6181
VANCOUVER
Dolati, M., Sayyaf, N. Towards Reliable Deep Reinforcement Learning for Industrial Applications: A DDPG-based Algorithm with Improved Performance. AUT Journal of Mechanical Engineering, 2026; 10(1): 61-74. doi: 10.22060/ajme.2025.24180.6181