2022年 05期

基于深度强化学习的单臂机器人末端姿态控制

Terminal Attitude Control of Single Arm Robots Based on Deep Reinforcement Learning


摘要(Abstract):

基于双延迟深度确定性策略梯度算法对单臂机器人倒立摆在Simulink软件环境中进行仿真,并与深度确定性策略梯度算法进行比较,验证该算法的控制精度以及在机器人控制中的应用可行性;建立单臂机器人倒立摆仿真模型,添加摩擦模型,并以单臂机器人参数辨识所得的实际参数对所建立的仿真模型加以约束,提高实际应用时的控制精度与实时性;在训练过程中对摆杆随机施加一定数值范围内的干扰力,提高训练模型的抗干扰能力;根据所建立仿真模型的特点设计、改进Actor-Critic网络及奖励函数,在短时间内以较小的控制力使末端摆杆从初始状态摆动至竖直状态并持续保持。结果表明,改进的双延迟深度确定性策略梯度算法可以在减小输出控制力的同时对机械臂末端姿态实现并保持精准控制,并且在受到干扰力时可自行调整,改善了训练模型的鲁棒性与适应性,减少了运行时间。

关键词(KeyWords): 机器人控制;双延迟深度确定性策略梯度算法;强化学习;卷积神经网络;倒立摆系统

基金项目(Foundation): 国家自然科学基金项目(51875250)

作者(Author): 范振,陈乃建,董春超,张来伟,包佳伟,李亚辉,李映君

DOI: 10.13349/j.cnki.jdxbn.20220527.001

参考文献(References):

[1] CHANG L,PIAO S H,LENG X K,et al.Inverted pendulum model for turn-planning for biped robot[J].Physical Communication,2020,42:101168.

[2] FOK K L,LEE J,VETTE A H,et al.Kinematic error magnitude in the single-mass inverted pendulum model of human standing posture[J].Gait & Posture,2018,63:23-26.

[3] 谢友强,戴福全,高学山.基于MATLAB的旋转倒立摆建模和控制仿真[J].工业控制计算机,2021,34(3):46-47,49.

[4] 陈文,徐晓龙,钟晓伟,等.基于改进遗传算法的环形倒立摆PID参数整定[J].计算机仿真,2021,38(3):165-169.

[5] HAMZA M F,YAP H J,CHOUDHURY I A,et al.Current development on using Rotary Inverted Pendulum as a benchmark for testing linear and nonlinear control algorithms[J].Mechanical Systems and Signal Processing,2019,116:347-369.

[6] 马永凌.旋转倒立摆的显式模型预测控制[J].中国管理信息化,2021,24(5):166-169.

[7] AGARANA M C,AKINLABI E T.Lagrangian-Laplace dynamic mechanical analysis and modeling of inverted pendulum[J].Procedia Manufacturing,2019,35:711-718.

[8] DWIVEDI P,PANDEY S,JUNGHARE A S.Stabilization of unstable equilibrium point of rotary inverted pendulum using fractional controller[J].Journal of the Franklin Institute,2017,354(17):7732-7766.

[9] BONIFACIO S R,PATRICIO O O,ALEXANDER P G.Robust stabilizing control for the electromechanical triple-link inverted pendulum system[J].IFAC:Papers Online,2018,51(13):314-319.

[10] SHI Q,YING W D,LV L,et al.Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints[J].The Industrial Robot,2020,47(3):335-347.

[11] GU S X,HOLLY E,LILLICRAP T,et al.Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//2017 IEEE International Conference on Robotics and Automation (ICRA),May 29-June 3,2017,Singapore.New York:IEEE,2017:3389-3396.

[12] WEN S H,CHEN J H,WANG S,et al.Path planning of humanoid arm based on deep deterministic policy gradient[C]//2018 IEEE International Conference on Robotics and Biomimetics (ROBIO),December 12-15,2018,Kuala Lumpur,Malaysia.New York:IEEE,2018:1755-1760.

[13] 王建平,王刚,毛晓彬,等.基于深度强化学习的二连杆机械臂运动控制方法[J].计算机应用,2021,41(6):1799-1804.

[14] 康朝海,孙超,荣垂霆,等.基于动态延迟策略更新的TD3算法[J].吉林大学学报(信息科学版),2020,38(4):474-481.

[15] 黄玉林,陈乃建,范振,等.基于人机协作的机器人柔顺示教及再现[J].济南大学学报(自然科学版),2021,35(2):108-114.

[16] 李欣童,熊智,陈明星,等.基于深度强化学习的无人机集群协同信息筛选方法研究[J].电光与控制,2021,28(10):6-10.

[17] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.

[18] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.

[19] JADERBERG M,MNIH V,CZARNECKI W M,et al.Reinforcement learning with unsupervised auxiliary tasks[J].arXiv,2016(2016-11-16)[2021-05-12].https://arxiv.org/pdf/1611.05397.pdf.

[20] NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[J].arXiv,2017(2017-11-22)[2021-05-12].https://yanpuli.github.io/files/1702.08892.pdf.

[21] FUJIMOTO S,VAN HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning,July 10-15,2018,Stockholm,Sweden.Stockholm:PMLR,2018:1587-1596.

[22] MANNION P,DEVLIN S,MASON K,et al.Policy invariance under reward transformations for multi-objective reinforcement learning[J].Neurocomputing,2017,263(8):60-73.