In Stackelberg games, one player, the leader, commits to a strategy publicly before the remaining players, the followers, make their decisions (Fudenberg and Tirole, 1991). There are many multi-agent security domains, such as attacker-defender scenarios and patrolling, for which these types of commitments by the security agent are necessary (Agmon et al., 2008; Brown et al., 2006; Kiekintveld et al., 2009; Paruchuri et al., 2006), and it has been shown that Stackelberg games appropriately model these commitments (Paruchuri et al., 2008; Pita et al., 2008). For example, security personnel patrolling an infrastructure decide on a patrolling strategy first, before their adversaries act taking this committed strategy into account. Indeed, Stackelberg games are at the heart of theARMOR system, deployed at LAX since 2007 to schedule security personnel (Paruchuri et al., 2008; Pita et al., 2008), and they have recently been applied to federal air marshals (Kiekintveld et al., 2009). Moreover, these games have potential applications for network routing and pricing in transportation systems, among many others possibilities (Cardinal et al., 2005; Korilis, Lazar, and Orda, 1997).
Existing algorithms for Bayesian Stackelberg games find optimal solutions considering an a priori probability distribution over possible follower types (Conitzer and Sandholm, 2006; Paruchuri et al., 2008). Unfortunately, to guarantee optimality, these algorithms make strict assumptions on the underlying games; namely, that players are perfectly rational and that followers perfectly observe the leader's strategy. However, these assumptions rarely hold in real-world domains, particularly those involving human actors.