REINFORCEMENT LEARNING IN A VIRTUAL WORLD: A STUDY OF PPO AND SAC WITHIN UNITY ML AGENTS

Rufat Mammadzada, Azerbaijan State Oil and Industry University. Address: Azadliq Avenue 34, AZ1010, Baku, Azerbaijan E-mail: mammadzadarufat@gmail.com.Follow

Abstract

This study explores the use of Unity3D as a versatile platform for developing, training, and evaluating intelligent agents through reinforcement learning. Leveraging the Unity ML-Agents Toolkit, a dynamic 3D environment was created to examine agent learning behavior using two advanced algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). The simulation environment consisted of navigable terrain bounded by red borders, with collectible blue balls serving as rewards and a purple cube representing the agent. A carefully designed reward system was implemented to encourage goal-directed behavior and penalize inefficiency, while time constraints introduced an additional challenge requiring both precision and speed.

Through iterative training and refinement, the agent demonstrated increasingly complex behaviors, such as path optimization and efficient resource collection. Comparative analysis revealed that SAC exhibited rapid initial learning but suffered from performance instability due to excessive exploration, while PPO showed slower convergence but achieved more stable and consistent long-term results.

The findings highlight Unity’s potential as a comprehensive simulation and experimentation framework, bridging the gap between real-time visualization and machine learning. Beyond game development, this approach can be extended to applications in robotics, industrial automation, and intelligent system design, offering an accessible yet powerful environment for studying adaptive, autonomous behaviors in virtual settings.

First Page

Last Page

103

References

Haas, J.K. (2014). A history of the Unity game engine.
Jangra, S., Singh, G., Mantri, A., Angra, S., Sharma, B. (2023). Interactivity development using Unity 3D software and C# programming. Proc. 14th Int. Conf. on Computing Communication and Networking Technologies (ICCCNT), 1-6.
Zhang, Q., Mao, N., Li, J., Zhou, N., Miao, J., Li, G. (2025). Design of the top-level code framework for Unity based on C# language. Industry Science and Engineering.
Ward, T., Bolt, A., Hemmings, N., Carter, S., Sanchez, M., Barreira, R., Noury, S., Anderson, K., Lemmon, J., Coe, J., Trochim, P., Handley, T., Bolton, A. (2020). Using Unity to help solve intelligence. ArXiv, abs/2011.09294.
Savid, Y., Mahmoudi, R., Maskeliūnas, R., Damaševičius, R. (n.d.). Simulated autonomous driving using reinforcement learning: A comparative study on Unity’s ML-Agents framework. Kaunas Univ. of Technology.
Harbaliev, G., Vasilev, V., Budakova, D.V. (2024). An approach to modeling and studying the behavior of firefighting drones using Unity ML-Agents. Proc. 12th Int. Scientific Conf. on Computer Science (COMSCI), 1-5.
Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., Lange, D. (2018). Unity: A general platform for intelligent agents. ArXiv, abs/1809.02627.
Unity Technologies. (2024). ML-Agents Toolkit 4.0.0 Release Notes. https://github.com/Unity-Technologies/ml-agents/releases/tag/4.0.0
Urmanov, M., Alimanova, M., Nurkey, A. (2019). Training Unity machine learning agents using reinforcement learning method. Proc. 15th Int. Conf. on Electronics, Computer and Computation (ICECCO), 1-4.
Hu, C. (2024). Research on the integrated application of machine learning in Unity. Proc. 2nd Int. Conf. on Machine Learning and Automation. doi: https://doi.org/10.54254/2755-2721/82/20241033
Singh, R., Zhao, L. (2025). NPC behavior in games using Unity ML-Agents: A reinforcement learning perspective. IEEE Xplore.
Unity Technologies. (2025). Unity Real-Time Development Platform Overview. [Online, Accessed: October 19, 2025]. Available: https://unity.com
Unity Technologies. (2025). Unity ML-Agents Toolkit Overview. [Online, Accessed: October 19, 2025]. Available: https://unity.com/products/machine-learning-agents
Pardo, F., Tavakoli, A., Levdik, V., Kormushev, P. (2018). Time Limits in Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80, 4424–4432. Available: https://proceedings.mlr.press/v80/pardo18a/pardo18a.pdf
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv, preprint arXiv:1707.06347.
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80, 1861–1870. Available: https://arxiv.org/abs/1801.01290
Almón-Manzano, L., Pastor-Vargas, R., Cuadra Troncoso, J.M. (2022). Deep Reinforcement Learning in Agents’ Training: Unity ML-Agents. In Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence, LNCS 13259, 391-400.
Bendowska, K., Zawadzki, P. (2023). Development and Verification of a Simulation Model of an Automated Assembly Line. Applied Sciences, 13(18), 10142. MDPI. [Online, Accessed: October 19, 2025]. Available: https://www.mdpi.com/2076-3417/13/18/10142

Recommended Citation

Mammadzada, Rufat (2025) "REINFORCEMENT LEARNING IN A VIRTUAL WORLD: A STUDY OF PPO AND SAC WITHIN UNITY ML AGENTS," Chemical Technology, Control and Management: Vol. 2025: Iss. 5, Article 13.
DOI: https://doi.org/10.59048/2181-1105.1722

Download

Included in

Complex Fluids Commons, Controls and Control Theory Commons, Industrial Technology Commons, Process Control and Systems Commons

COinS