# Result

**Some snippets from your converted document:**

Data Biased Robust Counter Strategies Michael Johanson Michael Bowling johanson@cs.ualberta.ca bowling@cs.ualberta.ca Department of Computing Science Department of Computing Science University of Alberta University of Alberta Edmonton, Alberta, Canada Edmonton, Alberta, Canada Abstract an agent, assuming perfect knowledge of its static oppo- nent. However, such strategies are brittle: against a worst The problem of exploiting information about the case opponent, they have a high exploitability. In a two- environment while still being robust to inaccu- player zero-sum game, a Nash equilibrium strategy maxi- rate or incomplete information arises in many mizes its utility against a worst-case opponent. As a result, domains. Competitive imperfect information we say that such strategies are robust. If a perfect model games where the goal is to maximally exploit an of the opponent is available, then they can be exploited by unknown opponent’s weaknesses are an example a best response; if a model is not available, then playing a of this problem. Agents for these games must Nash equilibrium strategy is a sensible choice. However, balance two objectives. First, they should aim to if a model exists but it is somewhat unreliable (e.g., if it is exploit data from past interactions with the op- formed from a limited number of observations of the oppo- ponent, seeking a best-response counter strategy. nent’s actions, or if the opponent is known to be changing Second, they should aim to minimize losses since strategies) then a better option may be to compromise: ac- the limited data may be misleading or the oppo- cepting a slightly lower worst-case utility in return for a nent’s strategy may have changed, suggesting an higher utility if the model is approximately correct. opponent-agnostic Nash equilibrium strategy. In One simple approach for creating such a compromise strat- this paper, we show how to partially satisfy both egy is to create both a best response strategy and a Nash of these objectives at the same time, producing equilibrium strategy, and then play a mixture of the two. strategies with favourable tradeoffs between the Before each game, we will flip a biased coin. With prob- ability to exploit an opponent and the capacity ability p we will use the best response, and with probabil- to be exploited. Like a recently published tech- ity (1 − p) we will use the Nash equilibrium. By varying nique, our approach involves solving a modified p, we can create a range of strategies that linearly trade game; however the result is more generally appli- off exploitation of the opponent and our own exploitability cable and even performs well in situations with by a worst-case opponent. While this approach is a useful very limited data. We evaluate our technique in baseline, we would like to make more favourable tradeoffs the game of two-player, Limit Texas Hold’em. between these goals. McCracken and Bowling (2004) proposed -safe strategies as another approach. The set of -safe strategies contains 1 INTRODUCTION all strategies that are exploitable by no more than . From this set, the strategies that maximize utility against the op- Maximizing utility in the presence of other agents is a fun- ponent are the set of -safe best responses. Thus, for a damental problem in game theory. In a zero-sum game, chosen , the set of -safe best responses achieve the best utility comes from the exploitation of opponent weak- possible tradeoffs between exploitation and exploitability. nesses, but it is important not to allow one’s own strategy to However, their approach is computationally infeasible for be exploited in turn. Two approaches to such problems are large domains, and has only been applied to Ro-Sham-Bo well k

**Recently converted files (publicly available):**