APSIPA Transactions on Signal and Information Processing > Vol 11 > Issue 1

Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning

Mingqi Yuan, School of Science and Engineering, The Chinese University of Hong Kong, China, Qi Cao, School of Science and Engineering, The Chinese University of Hong Kong, China, Man-On Pun, School of Science and Engineering, The Chinese University of Hong Kong, China, SimonPun@cuhk.edu.cn , Yi Chen, School of Science and Engineering, The Chinese University of Hong Kong, and Shenzhen Research Institute of Big Data, China
 
Suggested Citation
Mingqi Yuan, Qi Cao, Man-On Pun and Yi Chen (2022), "Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning", APSIPA Transactions on Signal and Information Processing: Vol. 11: No. 1, e32. http://dx.doi.org/10.1561/116.00000028

Publication Date: 31 Oct 2022
© 2022 M. Yuan, Q. Cao, M.-O. Pun and Y. Chen
 
Subjects
 
Keywords
User schedulingRBG allocationfairness-orientedMulti-agent reinforcement learning (MARL)
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 136 times

In this article:
Introduction 
System Model and Problem Formulation 
Stochastic Game Framework for RBG Allocation 
MARL-Based Algorithm 
Simulation and Analysis 
Conclusion 
Appendix: Network Mechanism 
References 

Abstract

In this work, we develop practical user scheduling algorithms for downlink bursty traffic with emphasis on user fairness. In contrast to the conventional scheduling algorithms that either equally divide the transmission time slots among users or maximize some ratios without practical physical interpretations, we propose to use the 5%-tile user data rate (5TUDR) as the metric to evaluate user fairness. Since it is difficult to directly optimize 5TUDR, we first cast the problem into the stochastic game framework and subsequently propose a Multi-Agent Reinforcement Learning (MARL)-based algorithm to perform optimization on the resource block group (RBG) allocation in a highly computationally efficient manner. Furthermore, each MARL agent is designed to take information measured by network counters from multiple network layers (e.g. Channel Quality Indicator, Buffer size) as the input states while the RBG allocation as action with a carefully designed reward function developed to maximize 5TUDR. Extensive simulation is performed to show that the proposed MARL-based scheduler can achieve fair scheduling while maintaining good average network throughput as compared to conventional schedulers.

DOI:10.1561/116.00000028