•  
  •  
 

Turkish Journal of Electrical Engineering and Computer Sciences

Author ORCID Identifier

MUHAMMED SAYIN: 0000-0001-5779-3986

Abstract

Non-cooperative multi-agent learning, focusing on individual rationality (anarchy), often falls short in achieving system-wide efficiency in potential games, a class of games with applications in decentralized control and optimization. On the other hand, cooperative approaches prioritize system efficiency but often via global coordination, which could be impractical, e.g., for large-scale and less controlled environments. To address this dilemma, we propose a novel framework that introduces partial team formations, allowing team members with shared objectives to coordinate their actions while maintaining team-wise rationality for improved system-wide efficiency without the burden of global coordination. We model such interactions as a multi-team game and analyze team equilibrium, where no team has an incentive to deviate unilaterally. We show that team formations preserve the potential game structure and guarantee improved or maintained worst-case equilibrium values. To learn effective team coordination, we leverage the Team-Fictitious Play (Team-FP) dynamics, allowing agents to adapt their strategies based on past interactions with other teams and teammates. However, learning to coordinate in the best team response against adapting opponents poses a challenge for the convergence analyses. We leverage a divide-and-conquer approach by dividing the horizon into epochs whose lengths grow sufficiently slowly to couple the evolution of the dynamics with a reference (stationary) scenario and address the non-stationarity challenge gradually across these epochs. We prove the almost sure convergence of Team-FP to equilibrium under standard conditions on the step sizes, ensuring the long-term stability and efficiency of the proposed framework. Our approach has significant implications for various decentralized control and optimization applications, including distributed resource allocation, traffic management, and power grid control. For example, we provide numerical experiments in the context of wireless network optimization, corroborating the theoretical convergence guarantees and demonstrating the effectiveness of Team-FP in achieving efficient and stable outcomes in practice.

DOI

10.55730/1300-0632.4112

Keywords

Multi-agent learning, potential games, reinforcement learning

First Page

32

Last Page

47

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS