参考文献(References):
[1] CHANG L,PIAO S H,LENG X K,et al.Inverted pendulum model for turn-planning for biped robot[J].Physical Communication,2020,42:101168.
[2] FOK K L,LEE J,VETTE A H,et al.Kinematic error magnitude in the single-mass inverted pendulum model of human standing posture[J].Gait & Posture,2018,63:23-26.
[3] 谢友强,戴福全,高学山.基于MATLAB的旋转倒立摆建模和控制仿真[J].工业控制计算机,2021,34(3):46-47,49.
[4] 陈文,徐晓龙,钟晓伟,等.基于改进遗传算法的环形倒立摆PID参数整定[J].计算机仿真,2021,38(3):165-169.
[5] HAMZA M F,YAP H J,CHOUDHURY I A,et al.Current development on using Rotary Inverted Pendulum as a benchmark for testing linear and nonlinear control algorithms[J].Mechanical Systems and Signal Processing,2019,116:347-369.
[6] 马永凌.旋转倒立摆的显式模型预测控制[J].中国管理信息化,2021,24(5):166-169.
[7] AGARANA M C,AKINLABI E T.Lagrangian-Laplace dynamic mechanical analysis and modeling of inverted pendulum[J].Procedia Manufacturing,2019,35:711-718.
[8] DWIVEDI P,PANDEY S,JUNGHARE A S.Stabilization of unstable equilibrium point of rotary inverted pendulum using fractional controller[J].Journal of the Franklin Institute,2017,354(17):7732-7766.
[9] BONIFACIO S R,PATRICIO O O,ALEXANDER P G.Robust stabilizing control for the electromechanical triple-link inverted pendulum system[J].IFAC:Papers Online,2018,51(13):314-319.
[10] SHI Q,YING W D,LV L,et al.Deep reinforcement learning-based attitude motion control for humanoid robots with stability constraints[J].The Industrial Robot,2020,47(3):335-347.
[11] GU S X,HOLLY E,LILLICRAP T,et al.Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//2017 IEEE International Conference on Robotics and Automation (ICRA),May 29-June 3,2017,Singapore.New York:IEEE,2017:3389-3396.
[12] WEN S H,CHEN J H,WANG S,et al.Path planning of humanoid arm based on deep deterministic policy gradient[C]//2018 IEEE International Conference on Robotics and Biomimetics (ROBIO),December 12-15,2018,Kuala Lumpur,Malaysia.New York:IEEE,2018:1755-1760.
[13] 王建平,王刚,毛晓彬,等.基于深度强化学习的二连杆机械臂运动控制方法[J].计算机应用,2021,41(6):1799-1804.
[14] 康朝海,孙超,荣垂霆,等.基于动态延迟策略更新的TD3算法[J].吉林大学学报(信息科学版),2020,38(4):474-481.
[15] 黄玉林,陈乃建,范振,等.基于人机协作的机器人柔顺示教及再现[J].济南大学学报(自然科学版),2021,35(2):108-114.
[16] 李欣童,熊智,陈明星,等.基于深度强化学习的无人机集群协同信息筛选方法研究[J].电光与控制,2021,28(10):6-10.
[17] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[18] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[19] JADERBERG M,MNIH V,CZARNECKI W M,et al.Reinforcement learning with unsupervised auxiliary tasks[J].arXiv,2016(2016-11-16)[2021-05-12].https://arxiv.org/pdf/1611.05397.pdf.
[20] NACHUM O,NOROUZI M,XU K,et al.Bridging the gap between value and policy based reinforcement learning[J].arXiv,2017(2017-11-22)[2021-05-12].https://yanpuli.github.io/files/1702.08892.pdf.
[21] FUJIMOTO S,VAN HOOF H,MEGER D.Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning,July 10-15,2018,Stockholm,Sweden.Stockholm:PMLR,2018:1587-1596.
[22] MANNION P,DEVLIN S,MASON K,et al.Policy invariance under reward transformations for multi-objective reinforcement learning[J].Neurocomputing,2017,263(8):60-73.