•  
  •  
 

Abstract

This study explores the use of Unity3D as a versatile platform for developing, training, and evaluating intelligent agents through reinforcement learning. Leveraging the Unity ML-Agents Toolkit, a dynamic 3D environment was created to examine agent learning behavior using two advanced algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). The simulation environment consisted of navigable terrain bounded by red borders, with collectible blue balls serving as rewards and a purple cube representing the agent. A carefully designed reward system was implemented to encourage goal-directed behavior and penalize inefficiency, while time constraints introduced an additional challenge requiring both precision and speed.

Through iterative training and refinement, the agent demonstrated increasingly complex behaviors, such as path optimization and efficient resource collection. Comparative analysis revealed that SAC exhibited rapid initial learning but suffered from performance instability due to excessive exploration, while PPO showed slower convergence but achieved more stable and consistent long-term results.

The findings highlight Unity’s potential as a comprehensive simulation and experimentation framework, bridging the gap between real-time visualization and machine learning. Beyond game development, this approach can be extended to applications in robotics, industrial automation, and intelligent system design, offering an accessible yet powerful environment for studying adaptive, autonomous behaviors in virtual settings.

First Page

96

Last Page

103

References

  1. Haas, J.K. (2014). A history of the Unity game engine.
  2. Jangra, S., Singh, G., Mantri, A., Angra, S., Sharma, B. (2023). Interactivity development using Unity 3D software and C# programming. Proc. 14th Int. Conf. on Computing Communication and Networking Technologies (ICCCNT), 1-6.
  3. Zhang, Q., Mao, N., Li, J., Zhou, N., Miao, J., Li, G. (2025). Design of the top-level code framework for Unity based on C# language. Industry Science and Engineering.
  4. Ward, T., Bolt, A., Hemmings, N., Carter, S., Sanchez, M., Barreira, R., Noury, S., Anderson, K., Lemmon, J., Coe, J., Trochim, P., Handley, T., Bolton, A. (2020). Using Unity to help solve intelligence. ArXiv, abs/2011.09294.
  5. Savid, Y., Mahmoudi, R., Maskeliūnas, R., Damaševičius, R. (n.d.). Simulated autonomous driving using reinforcement learning: A comparative study on Unity’s ML-Agents framework. Kaunas Univ. of Technology.
  6. Harbaliev, G., Vasilev, V., Budakova, D.V. (2024). An approach to modeling and studying the behavior of firefighting drones using Unity ML-Agents. Proc. 12th Int. Scientific Conf. on Computer Science (COMSCI), 1-5.
  7. Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mattar, M., Lange, D. (2018). Unity: A general platform for intelligent agents. ArXiv, abs/1809.02627.
  8. Unity Technologies. (2024). ML-Agents Toolkit 4.0.0 Release Notes. https://github.com/Unity-Technologies/ml-agents/releases/tag/4.0.0
  9. Urmanov, M., Alimanova, M., Nurkey, A. (2019). Training Unity machine learning agents using reinforcement learning method. Proc. 15th Int. Conf. on Electronics, Computer and Computation (ICECCO), 1-4.
  10. Hu, C. (2024). Research on the integrated application of machine learning in Unity. Proc. 2nd Int. Conf. on Machine Learning and Automation. doi: https://doi.org/10.54254/2755-2721/82/20241033
  11. Singh, R., Zhao, L. (2025). NPC behavior in games using Unity ML-Agents: A reinforcement learning perspective. IEEE Xplore.
  12. Unity Technologies. (2025). Unity Real-Time Development Platform Overview. [Online, Accessed: October 19, 2025]. Available: https://unity.com
  13. Unity Technologies. (2025). Unity ML-Agents Toolkit Overview. [Online, Accessed: October 19, 2025]. Available: https://unity.com/products/machine-learning-agents
  14. Pardo, F., Tavakoli, A., Levdik, V., Kormushev, P. (2018). Time Limits in Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80, 4424–4432. Available: https://proceedings.mlr.press/v80/pardo18a/pardo18a.pdf
  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv, preprint arXiv:1707.06347.
  16. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80, 1861–1870. Available: https://arxiv.org/abs/1801.01290
  17. Almón-Manzano, L., Pastor-Vargas, R., Cuadra Troncoso, J.M. (2022). Deep Reinforcement Learning in Agents’ Training: Unity ML-Agents. In Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence, LNCS 13259, 391-400.
  18. Bendowska, K., Zawadzki, P. (2023). Development and Verification of a Simulation Model of an Automated Assembly Line. Applied Sciences, 13(18), 10142. MDPI. [Online, Accessed: October 19, 2025]. Available: https://www.mdpi.com/2076-3417/13/18/10142

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.