Challenges in Understanding and Playing 2x2 Non-Zero-Sum Games (Spring, 2017)

Prepared by:

Joseph Malkevitch
Department of Mathematics
York College (CUNY)
Jamaica, New York 11451

email:

malkevitch@york.cuny.edu

web page:

http://york.cuny.edu/~malk

When one looks at zero-sum matrix games there are some relatively simple ways that one can give "advice" about how to play them. One looks for dominating rows/columns that may reduce the number of actions that come into consideration in deciding how to play. One can look for saddle points. If one is "lucky" this approach produces a "value." However, if one is able to reduce the game to a 2x2 matrix where there is no saddle point or way to reduce the matrix to a single cell using dominating strategy analysis, we learned how to find a mixed strategy for each of the players that leads to "optimal" ways to play such a game. The game below has an "equilibrium" when mixed strategies are used.

 Column I Column II Row 1 9 -8 Row 2 -2 1

We learned two ways to find how Row and Column could design their optimal spinners (randomization device for best play in a game such as this one).

a. Compute the expected value (remember that expected value is different from a probability) by constructing a sum of four terms which represent the percentage of the time one would land in each cell above, and use factoring to find the "optimal spinners" (e.g. randomization devices) for each of the players. We saw the result of this was to obtain an expression of the form:

Expected value (Row's point of view) = C(p - u)(q -v) + W (*)

where p and q are the percentages of the time that Row should play Row 1 and Column should play Column I, respectively, when playing optimally, and C, u, and v are constants. Furthermore, W is the amount of "utility" that changes hands on average for each play of the game. Thus, when W is positive the game is to Row's advantage, when it is negative it is to Column's advantage, and when it is 0 the game is fair. Note that:

Theorem: When Row plays optimally, it does not matter what Column does and when Column plays optimally it does not matter what Row does, in the sense that if only one of the players deviates from the optimal choice it does not affect the value of W.

Using this theorem there is a computationally easier way to find the values of u, v and W in (*).

b. One can find u, v and W in (*) by solving two linear equations and using the solution of one of the two of these linear equations to find W.

What new happens for non-zero-sum games? In brief, everything "goes wrong." It is not easy to give good advice about how to play many of the games that are of the greatest interest because they are models for situations that come up in the real world regularly. Here I will just hint at the problems involved. But first, the idea of a equilibrium.

Intuitively, a equilibrium refers to strategies for both players where, if either player deviates from those strategies, the deviating player can only do worse in terms of the payoff received.

John Nash proved a very general result which states that a large class of games, including zero-sum and non-zero-sum matrix games, have at least one "equilibrium" in either pure strategies or mixed strategies. Mixed strategy refers to using a randomization device. Pure strategy means that there is an action one can use "all of the time." Writing this in "theorem" form:

Theorem (John Nash):

Given a game G with a finite number of players in which each player can choose from finitely many pure strategies then G has at least one equilibrium which consists of pure strategies or mixed strategies.

Note that some games have many equilibria and some of these equilibria may involve pure strategies and other involve mixed strategies.

For the non-zero-sum case the problem is that the equilibrium may not seem an attractive approach to playing the game because there are other outcomes that are "Pareto." Intuitively, this means that at least one player is better off with moving to the Pareto "solution" and the other player is no worse off.

Here is an example to illustrate how to compute the equilibria in a relatively straightforward case:

 Column I Column II Row 1 (1, -1) (3. 0) Row 2 (4, 2) (0, -1)

Figure 1

Drawing a motion diagram for this game indicates two equilibria in pure strategies: Row 2, Column I (payoffs: (4, 2)) and Row 1, Column II (payoffs: (3,0)).

However, now look at the "Row" game:

 1 3 4 0

(*)

payoffs from Row's point of view.

and the "Column" game:

 -1 0 2 -1

(**)

payoffs from Column's point of view.

We seek from Column's point of view what "spinner" - mixed strategy will equalize Row's earnings, whichever choice of rows Row plays in Row's game, and from Row's point of view what "spinner" - mixed strategy will equalize Column's payoff, whatever choice of columns Column plays in Column's game. If we let p and (1-p) represent the mixture of Row 1 and Row 2 and we use our knowledge of zero-sum games, we need to solve the equation:

-p + 2(1-p) = -(1-p)

giving the answer p = 3/4. Substituting in the equation above gives a payoff on average of -1/4.

If we let q and (1-q) represent the mixture of Column I and Column II and use our knowledge of zero-sum games, we need to solve the equation:

q + 3(1-q) = 4q

giving the answer q = 1/2. Substituting in the equation above gives a payoff on average of 2.

This equilibrium leads to the payoffs of 2 to Row and -1/4 to Column. Note that there are much more attractive equilibrium outcomes, namely, 4 for Row and 3 for Column. Of course, there are no negative payoffs for Row and either of the pure strategy equilibria is superior to the mixed strategy value for either player. However, Row has the "option" of forcing a negative payoff on Column at the cost of some benefit to his/her own "interests" if Row desires.

Now consider some ideas using terminology due to Philip Straffin. In a non-zero-sum game such as the one we started with each player can try to play optimally in the zero-sum game that represents their own personal payoffs. If Row, for example, plays optimally in the game (*) then Row can guarantee him/herself a certain amount, and this is called Row's security level. When Row plays the optimal strategy in the zero-sum game representing his/her own payoffs (for the example here, game (*)), this strategy is referred to as that player's prudential strategy. Similarly, Column's security level is his/her optimal strategy in the zero-sum game of his/her payoffs (from Column's point of view). In this example, this means Column finding the optimal strategy in (**) which, again, is shown from Column's point of view. Note that in general Row or Column's optimal strategy may be a pure strategy (because there may be dominating Rows or a saddlepoint) or a mixed strategy, and the same may be so for Column. Note that it may or may not be the case that Row and Column's prudential strategies coincide with their equalizing strategies!
Anticipating that Row might find his/her prudential strategy attractive, Column might respond by picking his/her optimal strategy against Row's prudential strategy. Similarly, Row could consider using his/her optimal strategy in response to Column's prudential strategy. These strategies of Row and Column, respectively, are called their counter-prudential strategies.

It is instructive for an example such as the game in Figure 1 to:

a. Find the pure equilibria (if any).

b. Find the mixed equilibrium (if there is one).

For b. this entails seeing if equalizing strategies for each of the players can be computed. Such equalizing strategies may not exist because a. holds!

Note: We know there must be at least one equilibrium of either type a. or b. by Nash's Theorem.

c. Find the payoffs for each player when Row plays his/her prudential strategy and Column plays his/her prudential strategy.

d. Find the payoffs for each player when Row plays his/her prudential strategy and Column plays his/her counter-prudential strategy.

e. Find the payoffs for each player when Row plays his/her counter-prudential strategy and Column plays his/her prudential strategy.

f. Find the payoffs for each player when Row plays his/her counter-prudential strategy and Column plays his/her counter-prudential strategy.

g. Find the collection of outcomes which are Pareto optimal for the game. (one approach here is to try drawing a payoff diagram and computing the convex hull of the 4 points shown as payoffs in Figure 1).

If you want more practice with this you can try your skills out on this example (due to Straffin) which shows the complexity of the possible results of doing all of these calculations!

 Column I Column II Row 1 (2, 4) (1. 0) Row 2 (3, 1) (0, 4)

Figure 2