A Decoupled Learning Strategy for Massive Access Optimization in Cellular IoT Networks

A Decoupled Learning Strategy for Massive Access Optimization in Cellular IoT Networks

Abstractnbsp;

Cellular-based networks are expected to offer con-nectivity for massive Internet of Things (mIoT) systems. However, their Random Access CHannel (RACH) procedure suffers from unreliability, due to the collision from the simultaneous massive access. Despite that this collision problem has been treated in existing RACH schemes, these schemes usually organize IoT devices' transmission and re-transmission along with fixed parameters, thus can hardly adapt to time-varying traffic patterns. Without adaptation, the RACH procedure easily suffers from high access delay, high energy consumption, or even access unavailability. With the goal of improving the RACH procedure, this paper targets to optimize the RACH procedure in real-time by maximizing a long-term hybrid multi-objective function, which consists of the number of access success devices, the average energy consumption , and the average access delay. To do so, we first optimize the long-term objective in the number of access success devices by using Deep Reinforcement Learning (DRL) algorithms for different RACH schemes, including Access Class Barring (ACB), Back-Off (BO), and Distributed Queuing (DQ). The converging capability and efficiency of different DRL algorithms including Policy Gradient (PG), Actor-Critic (AC), Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG) are compared. Inspired by the results from this comparison, a decoupled learning strategy is developed to jointly and dynamically adapt the access control factors of those three access schemes. This decoupled strategy integrates predicted traffic into the learning process to improve training efficiency, where a Recurrent Neural Network (RNN) model is first employed to predict the real-time traffic values of the network environment, and then multiple DRL agents are employed to cooperatively configure parameters of each RACH scheme. Our results demonstrate that the decoupled strategy remarkably accelerate the training speedy.