Lecture 5: Log-Optimality, Kelly Betting, and Universal Portfolios
Date: February 13, 2026 Topic: Horse Racing, Kelly Criterion, Coin Betting, KT Estimator, and Universal Portfolios
1. Horse Racing and Log-Optimality
We begin by exploring the connection between information theory and gambling, famously formalized by J.L. Kelly Jr. in 1956. This setting provides a clean motivation for why we care about “log-loss” in online learning.
The Setup
Consider a horse race with $m$ horses.
- Odds: The track offers odds $o_i$ for horse $i$. If you bet 1USD on horse $i$ and it wins, you get back $o_i$. If it loses, you lose your 1USD.
- Probabilities: The true probability of horse $i$ winning is $p_i$.
- Strategy: You distribute your wealth across the horses. Let $b_i$ be the fraction of your wealth bet on horse $i$, where $b \in \Delta_m$ (the probability simplex).
Wealth Growth
Let $X \in {1, \dots, m}$ be the random variable representing the winning horse. If horse $i$ wins (i.e., $X=i$), your wealth multiplication factor is:
\[S = b_i o_i\]If we repeat this for $T$ independent races, your wealth $S_T$ after $T$ rounds is:
\[S_T = S_0 \prod_{t=1}^T (b(X_t) o(X_t))\]To maximize long-term wealth, we maximize the exponential growth rate (or doubling rate):
\[W(b, p) = \lim_{T \to \infty} \frac{1}{T} \log_2 \left( \frac{S_T}{S_0} \right) = \mathbb{E}[\log_2 (b(X) o(X))] = \sum_{i=1}^m p_i \log_2 (b_i o_i)\]The Kelly Criterion
We want to find $b^\star$ that maximizes $W(b, p)$. We can rewrite the objective:
\[W(b, p) = \sum_{i=1}^m p_i \log_2 o_i + \sum_{i=1}^m p_i \log_2 b_i\]The first term doesn’t depend on our choice $b$. We need to maximize $\sum p_i \log_2 b_i$ subject to $\sum b_i = 1$. Using Lagrange multipliers or Jensen’s inequality, we can relate this to the KL divergence:
\[\sum_{i=1}^m p_i \log \frac{b_i}{p_i} = -D_{KL}(p \parallel b)\]Since $D_{KL}(p \parallel b) \ge 0$ and is zero if and only if $b=p$, the term is maximized when $b = p$.
Theorem (Kelly Criterion): The log-optimal investment strategy is proportional betting: $b^\star_i = p_i$.
Key Insight: The optimal strategy depends only on the win probabilities $p_i$, not on the odds $o_i$! Even if a long-shot has fantastic odds, you should not bet more than its probability of winning.
2. Kelly Betting with Cash
In the standard horse race model, we bet our entire wealth on the horses ($\sum b_i = 1$). What if we can keep some money in cash? This is equivalent to adding a “Cash” horse that always “wins” with odds 1 (you keep your money).
Let’s consider a simplified binary version of this, often called Kelly Betting in finance.
The Single Asset Model
- You have one asset (or a coin flip).
- Win: With probability $p$, you win $v$ dollars for every $1 bet.
- Lose: With probability $q=1-p$, you lose your bet.
- Strategy: You bet a fraction $f$ of your wealth on the asset and keep $(1-f)$ in cash.
Wealth Update:
- Win: $W_{t+1} = W_t(1-f) + W_t f(1+v) = W_t(1 + fv)$
- Lose: $W_{t+1} = W_t(1-f) + W_t f(0) = W_t(1 - f)$
Growth Rate: \(G(f) = p \log(1 + fv) + q \log(1 - f)\)
Taking the derivative w.r.t $f$ and setting to 0: \(\frac{pv}{1+fv} - \frac{q}{1-f} = 0 \implies pv(1-f) = q(1+fv)\) \(pv - pvf = q + qfv \implies f(pv+q) = pv - q\) \(f^\star = \frac{pv - q}{pv+q} = \frac{pv - q}{v(p + q/v)}\)
If $v=1$ (even money bet), $f^\star = p - q = 2p - 1$. This gives us the rule: Bet your edge.
3. The Coin-Betting Game and KT Estimator
The Kelly criterion tells us what to do if we know the probability $p$. But in online learning, we don’t! We face an adversarial sequence of outcomes.
The Game
- Start with wealth $\epsilon > 0$.
- In each round $t$, we choose a signed fraction $\beta_t \in [-1, 1]$.
- $\beta_t > 0$: Bet $\beta_t W_{t-1}$ on Heads (+1).
- $\beta_t < 0$: Bet $\lvert\beta_t\rvert W_{t-1}$ on Tails (-1).
- Adversary reveals outcome $c_t \in {-1, 1}$.
- Update: $W_t = W_{t-1}(1 + \beta_t c_t)$.
Goal: Maximize $\ln W_T$. We want to compete with the best constant betting fraction $\beta^\star$ in hindsight: \(\text{Regret}_T = \max_{\beta \in [-1, 1]} \sum_{t=1}^T \ln(1 + \beta c_t) - \sum_{t=1}^T \ln(1 + \beta_t c_t)\)
Optimal Wealth Lower Bound
Before analyzing the algorithm, let’s establish a lower bound on how much money the optimal constant betting strategy makes. This will serve as our benchmark.
Lemma: For any sequence of outcomes $c_t \in {-1, 1}$, \(\max_{\beta \in [-1, 1]} \exp\left(\sum_{t=1}^T \ln(1 + \beta c_t)\right) \ge \exp\left( \frac{(\sum_{t=1}^T c_t)^2}{4T} \right)\)
Proof: Let $S_T = \sum_{t=1}^T c_t$. We restrict our search to $\beta \in [-1/2, 1/2]$. Using the inequality $\ln(1+x) \ge x - x^2$ valid for $\lvert x \rvert \le 1/2$: \(\sum_{t=1}^T \ln(1 + \beta c_t) \ge \sum_{t=1}^T (\beta c_t - \beta^2 c_t^2)\) Since $c_t^2 = 1$: \(= \beta S_T - T \beta^2\) To maximize this quadratic in $\beta$, we set $\beta = \frac{S_T}{2T}$. Since $\lvert S_T \rvert \le T$, this $\beta$ is in $[-1/2, 1/2]$. Substituting back: \(\frac{S_T}{2T} S_T - T \left(\frac{S_T}{2T}\right)^2 = \frac{S_T^2}{2T} - \frac{S_T^2}{4T} = \frac{S_T^2}{4T}\) Exponentiating both sides gives the result.
The Krichevsky-Trofimov (KT) Bettor
How do we choose $\beta_t$? A natural idea is to estimate the bias of the coin based on past data. The KT Estimator suggests:
\[\beta_t = \frac{\sum_{i=1}^{t-1} c_i}{t}\]This is essentially the empirical mean of the coin flips (biased slightly towards 0).
Why this works:
- No-Regret: The KT algorithm guarantees that the wealth $W_T$ is close to the wealth of the best constant $\beta^\star$.
- Wealth Lower Bound (Theorem 10.5): The KT bettor guarantees a wealth of: \(\ln \text{Wealth}_T(KT) \ge \frac{(\sum_{t=1}^T c_t)^2}{4T} - \frac{1}{2} \ln T - O(1)\) Comparing this to the Lemma above, we see that the regret is $O(\ln T)$.
This connects perfectly to parameter-free online learning. The “sum of gradients” in OCO behaves like the “sum of coin flips” here.
4. Universal Portfolios
We now scale this up to the general portfolio selection problem, introduced by Thomas Cover (1991).
The Universal Algorithm
Cover proposed an algorithm that competes with the Best Constant Rebalanced Portfolio (BCRP) without knowing the market sequence. Just as the KT bettor can be seen as integrating over possible biases of a coin, Cover’s Universal Portfolio algorithm integrates over all possible portfolio vectors $b \in \Delta_m$.
Algorithm: At time $t$, play the performance-weighted average of all portfolios: \(\hat{b}_t = \frac{\int_{\Delta_m} b \cdot S_{t-1}(b) \, d\mu(b)}{\int_{\Delta_m} S_{t-1}(b) \, d\mu(b)}\) where $S_{t-1}(b)$ is the wealth a constant portfolio $b$ would have achieved up to time $t-1$.
Connection to KT:
- The KT bettor corresponds to a Universal Portfolio on 2 assets (Cash vs Asset) using a specific prior (Dirichlet-1/2).
- Cover’s algorithm generalizes this to $m$ assets.
Regret Bound: The Universal Portfolio achieves minimax optimal regret: \(\text{Regret}_T \le (m-1) \frac{\log T}{2} + \text{const}\) This means the algorithm loses almost nothing compared to the best strategy known only in hindsight.
Proof Sketch (Volume Argument): We want to bound the ratio of the wealth of the best CRP $b^\star$ to the wealth of the Universal Portfolio (UCRP). The UCRP wealth is the average wealth over the simplex $\Delta_m$: \(\text{Wealth}_T(\text{UCRP}) = \frac{1}{\text{Vol}(\Delta_m)} \int_{b \in \Delta_m} \text{Wealth}_T(b) \, db\)
Define a small ball around the optimal portfolio $b^\star$: \(\text{Ball}_\epsilon(b^\star) = \{ b \in \Delta_m : b = (1-\epsilon)b^\star + \epsilon v, v \in \Delta_m \}\)
We rely on two key properties:
- Volume: The volume of this ball shrinks polynomially with $\epsilon$: $\text{Vol}(\text{Ball}_\epsilon(b^\star)) = \epsilon^{m-1} \text{Vol}(\Delta_m)$.
- Wealth: For any $b \in \text{Ball}_\epsilon(b^\star)$, the wealth is close to optimal: $\text{Wealth}_T(b) \ge (1-\epsilon)^T \text{Wealth}_T(b^\star)$.
Now we can lower bound the UCRP wealth by integrating just over this ball: \(\begin{aligned} \text{Wealth}_T(\text{UCRP}) &\ge \frac{1}{\text{Vol}(\Delta_m)} \int_{b \in \text{Ball}_\epsilon(b^\star)} \text{Wealth}_T(b) \, db \\ &\ge \frac{1}{\text{Vol}(\Delta_m)} \int_{b \in \text{Ball}_\epsilon(b^\star)} (1-\epsilon)^T \text{Wealth}_T(b^\star) \, db \\ &= \frac{\text{Vol}(\text{Ball}_\epsilon(b^\star))}{\text{Vol}(\Delta_m)} (1-\epsilon)^T \text{Wealth}_T(b^\star) \\ &= \epsilon^{m-1} (1-\epsilon)^T \text{Wealth}_T(b^\star) \end{aligned}\)
Taking logarithms: \(\ln \text{Wealth}_T(\text{UCRP}) \ge (m-1) \ln \epsilon + T \ln(1-\epsilon) + \ln \text{Wealth}_T(b^\star)\)
We choose $\epsilon = 1/T$ to balance the terms. Using $\ln(1 - 1/T) \approx -1/T$: \(\ln \text{Wealth}_T(\text{UCRP}) \ge -(m-1) \ln T - 1 + \ln \text{Wealth}_T(b^\star)\)
Rearranging gives the regret bound: \(\text{Regret}_T \le (m-1) \ln T + 1\)
Implementation
Calculating the integral over the simplex is computationally expensive ($O(T^{m-1})$).
- Sampling: We can approximate the integral by sampling portfolios.
- Special Cases: For $m=2$, it can be computed efficiently.
- Online Newton Step (ONS): As we saw in previous lectures, ONS can solve this OCO problem (using log-loss) much more efficiently ($O(m^3)$ per step) while achieving a similar $O(m \log T)$ regret, provided we assume the market does not crash (bounded away from 0).
5. Summary
- Kelly Criterion: In a known probabilistic setting, maximizing log-wealth is the optimal long-term strategy (Proportional Betting).
- KT Bettor: In an adversarial setting, betting based on the empirical mean (KT) allows us to compete with the best Kelly strategy in hindsight.
- Universal Portfolios: This concept generalizes to multi-asset markets. By averaging over all strategies weighted by their performance, we achieve “Universal” performance—doing as well as the best stock picker in the long run.