CS 8803 Sequence Prediction

Course materials for CS 8803 Sequence Prediction, Spring 2026 at Georgia Tech.

View project on GitHub

Homework 1

Due: Tuesday, February 17 at 10:00 PM

Problem 1: Exponential Weights Algorithm with a Prior

The Exponential Weights Algorithm typically initializes weights $w_{1}^{1}, \dots, w_{N}^{1}$ all to 1. Consider a modified algorithm $\mathcal{A}’$ where we initialize these weights according to a prior distribution $p$, i.e., $w_{i}^{1} = p_{i}$ where $p_{i} \ge 0$ for all $i$ and $\sum_{i} p_{i} = 1$. The algorithm then follows the standard multiplicative update rule:

\[w_{i}^{t+1} = w_{i}^{t} e^{-\eta \ell_{i,t}}\]

Prove that with this modified initialization, assuming losses $\ell_{i,t} \in [0, 1]$, the algorithm achieves the following bound for any expert $i$:

\[\text{Loss}_{T}(\mathcal{A}') \le \text{Loss}_{T}(\text{expert } i) + \frac{\ln(1/p_{i})}{\eta} + \frac{\eta T}{8}\]

(Correction: An earlier version of this problem had a bound with a $\frac{1}{1-e^{-\eta}}$ factor. Please use the formulation above which matches the lecture notes.)

Problem 2: Solving Feasibility Problems with Perceptron

We can define a special class of Linear Programming Feasibility (LPF) problems as follows. Given a set of $m$ vectors $a^{1}, \dots, a^{m} \in \mathbb{R}^{d}$, we want to find a feasible solution, which is a vector $y \in \mathbb{R}^{d}$ such that $\langle a^{i}, y \rangle > 0$ for all $i=1, \dots, m$.

For any vector $x$, we can define the margin:

\[\nu(x) := \min_{i} \frac{\langle a^{i}, x \rangle}{\lVert a^{i} \rVert_{2} \lVert x \rVert_{2}}\]

which is positive if $x$ is a feasible solution. Define the “wiggle room” of the feasibility problem as $\nu^{\star} := \sup_{x \in \mathbb{R}^{d}} \nu(x)$.

Assume we are given an LPF defined by $a^{1}, \dots, a^{m} \in \mathbb{R}^{d}$ with a positive wiggle room $\nu^{\star} > 0$. Using the Perceptron algorithm as a subroutine, give an algorithm that finds a feasible solution requiring no more than $(1/\nu^{\star})^{2}$ updates to Perceptron. Prove that your solution works.

Problem 3: Generalized Minimax Theorem

Let $\mathcal{X} \subset \mathbb{R}^{n}$ and $\mathcal{Y} \subset \mathbb{R}^{m}$ be convex compact sets. Let $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$ be a differentiable function with bounded gradients, where $f(\cdot, y)$ is convex in its first argument for all fixed $y$, and $f(x, \cdot)$ is concave in its second argument for all fixed $x$.

Prove that:

\[\inf_{x \in \mathcal{X}} \sup_{y \in \mathcal{Y}} f(x, y) = \sup_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}} f(x, y)\]

Furthermore, give an efficient algorithm for finding an $\epsilon$-optimal pair $(x^{\star}, y^{\star})$ for any parameter $\epsilon > 0$.

Problem 4: Online Non-Convex Optimization

Sometimes our nice assumptions don’t always hold. For this problem, assume that $\mathcal{K} \subset \mathbb{R}^{d}$ is the learner’s decision set, and the learner observes a sequence of functions $f_{1}, \dots, f_{T}$ mapping $\mathcal{K} \to \mathbb{R}$. The regret of an algorithm choosing a sequence $x_{1}, \dots, x_{T}$ is defined as:

\[\text{Regret}_{T} := \sum_{t=1}^{T} f_{t}(x_{t}) - \min_{x \in \mathcal{K}} \sum_{t=1}^{T} f_{t}(x)\]

Suppose the functions $f_{t}$ are not convex. However, we assume:

  1. The functions $f_{t}$ are bounded in $[0, 1]$.
  2. The functions are $L$-Lipschitz: $\lvert f_{t}(x) - f_{t}(x’) \rvert \le L \lVert x - x’ \rVert_{2}$.
  3. The set $\mathcal{K}$ is convex and bounded.

Prove that there exists a randomized algorithm with a reasonable expected-regret bound. Specifically, show that one can achieve something like:

$ \mathbb{E}[\text{Regret}_{T}] \le O(\sqrt{dT \log T}) $

(Hint: It is always good to ask the “experts” for ideas. You need not worry about computational efficiency.)

Problem 5: Satisfiability Conditions in Blackwell Approachability

In his seminal 1956 paper, David Blackwell introduced the concept of approachability for games with vector-valued payoffs. A set $S$ is approachable if the learner can ensure the average payoff vector converges to $S$, regardless of the adversary’s actions. While Blackwell’s original proof used a condition involving halfspaces, modern treatments often use a “best-response” style condition.

Let $X$ and $Y$ be convex, compact action sets for the learner and adversary, respectively, and let $f: X \times Y \to \mathbb{R}^d$ be a continuous, bi-linear payoff function. We consider two conditions for a closed, convex set $S \subseteq \mathbb{R}^d$:

  1. Response-Satisfiability (RS): This condition says that if the learner knew the adversary’s mixed strategy in advance, they could “satisfy” the set $S$ in expectation. \(\forall q \in \Delta_Y, \exists p \in \Delta_X : \mathbb{E}_{x \sim p, y \sim q} [f(x, y)] \in S\)

  2. Halfspace-Satisfiability (HS): This condition says that for any halfspace $H$ containing $S$, the learner has a fixed strategy that keeps the expected payoff in $H$ for any action the adversary might take. \(\forall (a, b) \text{ s.t. } S \subseteq \{z : \langle a, z \rangle \le b\}, \exists p \in \Delta_X : \forall y \in Y, \mathbb{E}_{x \sim p} [\langle a, f(x, y) \rangle] \le b\)

Your Task:

a) Show that RS implies HS. The “trick” here is to fix a halfspace $H_{a,b} \supseteq S$ and consider the scalar game with payoff $g(x, y) = \langle a, f(x, y) \rangle$. Use the Minimax Theorem on $g(x, y)$ to show that if $S$ can be reached when $q$ is known (RS), then the learner must have a single strategy $p$ that works for all $y$ for that specific halfspace.

b) Show that HS implies RS. Use the Separating Hyperplane Theorem: if RS fails for some $q$, then the point $z = \mathbb{E}_{x, y} [f(x, y)]$ must be outside $S$ for all $p$. Show that this implies the existence of a halfspace that $S$ satisfies but the learner cannot guarantee to stay within for that particular $q$.

Problem 6: Internal Regret and the Best-Response Condition

In standard online learning, external regret compares the learner’s total loss to the loss of the single best fixed action in hindsight. While powerful, this guarantee can be weak: a strategy with zero external regret might still be “consistently foolish.” For example, a learner might play action $i$ when action $j$ was obviously better on those specific rounds, yet still have zero external regret because action $i$ performed well on average over the entire sequence.

Internal regret (or swap regret) addresses this by considering a more powerful “local” deviation. It asks: “For any specific action $i$, would I have been better off if I had swapped it for action $j$ every time I played $i$?” A strategy with no internal regret is significantly more robust; in a game-theoretic setting, if all players minimize their internal regret, the empirical distribution of their play converges to a Correlated Equilibrium—a key solution concept that captures coordination without centralized control.

Let the learner’s actions be $i \in {1, \dots, N}$ and the adversary’s loss vectors be $\ell_t \in [0,1]^N$. The internal regret for pair $(i, j)$ at time $T$ is: \(R_{i \to j}(T) = \sum_{t=1}^T \mathbb{I}(a_t=i) (\ell_{i,t} - \ell_{j,t})\) We define the total internal regret as the maximum regret over all possible swaps: \(R(T) = \max_{i, j \in \{1, \dots, N\}} R_{i \to j}(T)\) Our goal is to show that Blackwell approachability can be used to guarantee that $R(T)/T \to 0$ as $T \to \infty$.

Your Task:

a) Construct a vector-valued payoff function $F(a, \ell)$ and identify an appropriate convex target set $S$ such that the average payoff vector $\bar{F}_T$ approaching $S$ implies that $R(T) = o(T)$. You must specify the dimensionality of your payoff space and the geometry of $S$.

b) Using the Response-Satisfiability (RS) condition from Problem 5, prove that your chosen set $S$ is approachable for this payoff function $F$. Hint: Given a fixed distribution over losses $q$, you need to find a distribution $p \in \Delta_N$ over actions such that the expected payoff vector lies in $S$. Think about the relationship between the losses and the action weights as a system of flow or a Markov chain transition matrix. Can you find a $p$ such that the “total flow” of regret is balanced or non-positive?