Restless multi-armed bandits (RMABs) are a widely used framework for sequential decision-making, with applications in many fields. Their solution, however, is hindered by the exponentially growing state space and the combinatorial action space. As a result, designing efficient planning and learning algorithms for RMABs remains a long-standing challenge.
Convergence-guaranteed decentralized optimization over gossip communication networks with adversarial link failures.