Draft: Lecture 2: Zero-Sum Games, Minimax, and the Perceptron

Date: January 23, 2026 Topic: Hedge Algorithm, Equilibrium in Zero-Sum Games, Minimax Theorem, and the Perceptron Algorithm

1. Zero-Sum Games and the Minimax Theorem

In this section, we explore the deep connection between no-regret learning and game theory.

The Hedge Algorithm

The Hedge algorithm is a foundational no-regret algorithm for the “Expert Advice” setting. While the Exponential Weights Algorithm (EWA) is often framed as making predictions to minimize 0-1 loss, Hedge is framed as selecting among $N$ actions (experts) where each action $i$ suffers a loss value $\ell_{i,t} \in [0, 1]$ at each round $t$.

Algorithm:

Initialize weights $w_{i,1} = 1$ for $i=1, \dots, N$.
At each round $t = 1, \dots, T$:
- Choose a distribution $p_t$ where $p_{i,t} = \frac{w_{i,t}}{\sum_j w_{j,t}}$.
- Observe loss vector $\ell_t \in [0, 1]^N$.
- Suffer expected loss $\langle p_t, \ell_t \rangle$.
- Update weights: $w_{i,t+1} = w_{i,t} e^{-\eta \ell_{i,t}}$.

Regret Bound: Using the same logic as EWA (and Hoeffding’s Lemma), the expected regret of Hedge is:

\[E[R_T] \le \sqrt{\frac{T}{2} \ln N}\]

Zero-Sum Games and Minimax

A two-player zero-sum game is defined by a payoff matrix $A \in [0, 1]^{m \times n}$.

Player 1 (Minimizer): Chooses $p \in \Delta_m$.
Player 2 (Maximizer): Chooses $q \in \Delta_n$.
Payoff: Player 1 pays Player 2 the amount $p^\top A q$.

Definitions:

Weak Duality: For any matrix $A$, $\max_{q} \min_{p} p^\top A q \le \min_{p} \max_{q} p^\top A q$. This is always true by the “max-min $\le$ min-max” principle.
Strong Duality (Minimax Theorem): John von Neumann (1928) proved that equality holds:

\[\min_{p \in \Delta_m} \max_{q \in \Delta_n} p^\top A q = \max_{q \in \Delta_n} \min_{p \in \Delta_m} p^\top A q = V^\star\]

where $V^\star$ is the value of the game.

Proving the Minimax Theorem via No-Regret Learning

We will now prove the Minimax Theorem (strong duality) by showing that the gap between the “min-max” value and the “max-min” value must vanish if at least one player follows a no-regret strategy.

1. Definitions and Setup

Let $A \in [0, 1]^{m \times n}$ be the payoff matrix. Define:

Min-Max Value: $V_{\text{minmax}} = \min_{p} \max_{q} p^\top A q$
Max-Min Value: $V_{\text{maxmin}} = \max_{q} \min_{p} p^\top A q$

Weak Duality (Always True): $V_{\text{maxmin}} \le V_{\text{minmax}}$.

2. The Interaction Protocol

Consider a process for $T$ rounds where Player 1 (the row player) and Player 2 (the column player) interact:

In each round $t$, Player 1 chooses a distribution $p_t \in \Delta_m$ using a no-regret algorithm (like Hedge).
Player 2 observes $p_t$ and chooses a best-response distribution $q_t \in \Delta_n$:

\[q_t = \arg\max_{q \in \Delta_n} p_t^\top A q\]

Player 1 observes the loss vector $\ell_t = A q_t$, where the $i$-th entry is $(A q_t)_i$, the loss incurred by choosing action $i$.

3. Detailed Step-by-Step Proof

Step 1: Apply the No-Regret Guarantee. The no-regret algorithm used by Player 1 ensures that:

\[\frac{1}{T} \sum_{t=1}^T p_t^\top A q_t \le \min_{p \in \Delta_m} \left( \frac{1}{T} \sum_{t=1}^T p^\top A q_t \right) + \epsilon_T\]

where $\epsilon_T \to 0$ as $T \to \infty$ (for Hedge, $\epsilon_T = O(\sqrt{\frac{\ln m}{T}})$).

Step 2: Lower bound the left-hand side (LHS). By the definition of Player 2’s best response $q_t$:

\[p_t^\top A q_t = \max_{q \in \Delta_n} p_t^\top A q\]

Furthermore, since $V_{\text{minmax}} = \min_{p} \max_{q} p^\top A q$, it follows that for any specific $p_t$:

\[\max_{q \in \Delta_n} p_t^\top A q \ge V_{\text{minmax}}\]

Summing over $T$ rounds and averaging:

\[\frac{1}{T} \sum_{t=1}^T p_t^\top A q_t \ge \frac{1}{T} \sum_{t=1}^T V_{\text{minmax}} = V_{\text{minmax}}\]

Step 3: Upper bound the right-hand side (RHS). Let $\bar{q}T = \frac{1}{T} \sum{t=1}^T q_t$ be the average distribution chosen by Player 2. By linearity:

\[\min_{p \in \Delta_m} \left( \frac{1}{T} \sum_{t=1}^T p^\top A q_t \right) = \min_{p \in \Delta_m} p^\top A \bar{q}_T\]

By the definition of $V_{\text{maxmin}} = \max_{q} \min_{p} p^\top A q$, we know that for any specific $\bar{q}_T$:

\[\min_{p \in \Delta_m} p^\top A \bar{q}_T \le V_{\text{maxmin}}\]

Step 4: Combine the bounds. Substituting the results from Step 2 and Step 3 into the inequality from Step 1:

\[V_{\text{minmax}} \le \frac{1}{T} \sum_{t=1}^T p_t^\top A q_t \le V_{\text{maxmin}} + \epsilon_T\]

This implies:

\[V_{\text{minmax}} \le V_{\text{maxmin}} + \epsilon_T\]

Step 5: Take the limit. As $T \to \infty$, the regret term $\epsilon_T \to 0$. Therefore:

\[V_{\text{minmax}} \le V_{\text{maxmin}}\]

Combined with weak duality ($V_{\text{maxmin}} \le V_{\text{minmax}}$), we conclude:

\[V_{\text{minmax}} = V_{\text{maxmin}}\]

This completes the proof of the Minimax Theorem!

2. The Perceptron Algorithm

The Perceptron is a classic algorithm for online linear classification, originally proposed by Frank Rosenblatt (1957).

The Algorithm

Given a sequence $(x_1, y_1), \dots, (x_T, y_T)$ where $x_t \in \mathbb{R}^d$ and $y_t \in {-1, +1}$:

Initialize $w_1 = \mathbf{0}$.
For $t = 1, \dots, T$:
- Predict $\hat{y}_t = \text{sign}(\langle w_t, x_t \rangle)$.
- If $\hat{y}_t \neq y_t$ (Mistake!):

\[w_{t+1} = w_t + y_t x_t\]

Else:

\[w_{t+1} = w_t\]

The Mistake Bound (Novikoff’s Theorem, 1962)

Theorem: Suppose there exists $w^\star$ such that $\lVert w^\star \rVert = 1$ and $y_t \langle w^\star, x_t \rangle \ge \gamma$ for all $t$ (margin condition). If $\lVert x_t \rVert \le R$ for all $t$, then the total number of mistakes $M$ is bounded by:

\[M \le \frac{R^2}{\gamma^2}\]

Proof: Let $t_1, \dots, t_M$ be the indices where mistakes occurred.

Lower Bound: $\langle w_{M+1}, w^\star \rangle = \sum_{k=1}^M y_{t_k} \langle x_{t_k}, w^\star \rangle \ge M\gamma$. Since $\langle w_{M+1}, w^\star \rangle \le \lVert w_{M+1} \rVert \cdot 1$, we have $\lVert w_{M+1} \rVert \ge M\gamma$.
Upper Bound: $\lVert w_{t+1} \rVert^2 = \lVert w_t + y_t x_t \rVert^2 = \lVert w_t \rVert^2 + 2y_t \langle w_t, x_t \rangle + \lVert x_t \rVert^2$. On a mistake, $y_t \langle w_t, x_t \rangle \le 0$, so $\lVert w_{t+1} \rVert^2 \le \lVert w_t \rVert^2 + R^2$. Thus, $\lVert w_{M+1} \rVert^2 \le MR^2$.
Combine: $(M\gamma)^2 \le \lVert w_{M+1} \rVert^2 \le MR^2 \implies M \le \frac{R^2}{\gamma^2}$.

3. Historical Context

Minimax & Game Theory: John von Neumann (1928) published the first proof of the Minimax Theorem using fixed-point theorems. Later, George Dantzig (1947) connected it to Linear Programming duality.
The Perceptron: Frank Rosenblatt (1957) conceived the Perceptron as a model for biological neurons. Albert Novikoff (1962) provided the first formal convergence proof.

CS 8803 Sequence Prediction

Course materials for CS 8803 Sequence Prediction, Spring 2026 at Georgia Tech.