►狂賀!!本系陳仁暉教授指導博士生,利用增強式學習與知識遷移方法改進無人機省電機制,獲得人工智慧頂級期刊Engineering Applications of Artificial Intelligence收錄(SCI, IF: 8.0, Rank: 5/90 (5.556% (Q1), 2022))。
Zhiqun Hu, Yujing Zhang, Hao Huang, Xiangming Wen, Obinna Agbodike, and Jenhui Chen*, "Reinforcement Learning for Energy Efficiency Improvement in UAV-BS Access Networks: A Knowledge Transfer Scheme," Engineering Applications of Artificial Intelligence, vol. 120, 105930, April 2023. (SCI, IF: 8.0, Rank: 5/90 (5.556% (Q1), 2022) in Engineering, Multidisciplinary) DOI: 10.1016/j.engappai.2023.105930
Recently the possibility of forming unmanned aerial vehicle base station (UAV-BS) network systems with energy harvesting capabilities to support persistent wireless access services for pedestrian users has been validated. Due to the need of sustaining wireless access services of the UAV-BSs, we investigate an optimal policy to maximize the overall energy utilization efficiency (renewable energy) of the UAV-BSs during their active in-flight network access operations. Since the natural sources of renewable energy (e.g., solar energy or wind energy harvesting) have stochastic properties with respect to the arrival rate of the dynamics of the unknown environment, we exploit an actor–critic reinforcement learning framework, which considers the continuous-valued states and action space for learning the best policy during interaction with the environment. To enhance and expedite the learning process, a transfer asynchronous advantage actor–critic (TA3C) algorithm is proposed, which enables UAV-BSs to transfer (i.e., share) knowledge gained in historical periods, during parallel task asynchronous executions on multiple instances of the environment. Numerical results reveal that the proposed TA3C algorithm surpasses the classic A3C and A2C algorithms in terms of throughput and optimal energy utilization efficiency.
最近,形成具有能量收集功能的無人機基地台(UAV-BS)網路系統以支援行人用戶持續無線存取服務的可能性已得到驗證。由於需要維持無人機基地台的無線存取服務,我們研究了一種最佳策略,以最大限度地提高無人機基站在主動飛行網路存取操作期間的整體能源利用效率(可再生能源)。由於再生能源(例如太陽能或風能採集)的天然來源相對於未知環境動態的到達率具有隨機特性,因此我們利用了一個行動者批評家強化學習框架,該框架考慮了連續值在與環境互動過程中學習最佳策略的狀態和行動空間。為了增強和加快學習過程,提出了一種傳輸非同步優勢參與者-批評家(TA3C)演算法,該演算法使UAV-BS 能夠在多個實例上並行任務非同步執行期間傳輸(即共享)在歷史時期獲得的知識。數值結果表明,所提出的TA3C演算法在吞吐量和最佳能量利用效率方面超越了經典的A3C和A2C演算法。