01-projects / automated-investing / experiments

level 4 calculus

Thu Apr 09 2026 20:00:00 GMT-0400 (Eastern Daylight Time) ·experiment-writeup ·status: complete

Level 4 — Calculus & Optimization Drills

Two drills from the quant-from-scratch roadmap Level 4 homework. Calculus is the language of change, and every optimizer underneath every model in this project is doing some form of gradient descent. This level is about understanding what the machinery is actually doing so we can reason about when it breaks.

Drill 1 — Gradient descent from scratch on Rosenbrock

Script: ../scripts/level_4_gradient_descent.py

Setup: The Rosenbrock function f(x,y) = (1-x)² + 100(y - x²)² has a known minimum at (1, 1) with f = 0. It’s smooth but has a narrow curved valley, which means naive gradient descent bounces against the walls instead of sliding along the floor. We implement three optimizers by hand — vanilla GD, momentum GD, and Adam — all using the analytical gradient. No torch, no scipy.

Analytical gradient (derived from the chain rule):

∂f/∂x = -2(1-x) - 400x(y - x²)
∂f/∂y = 200(y - x²)

Results (5,000 steps from starting point (-1.5, 2.5)):

    Vanilla GD  final=(0.91407, 0.83517)  f=0.007396  dist_to_min=0.18588
   Momentum GD  final=(0.98932, 0.97870)  f=0.000114  dist_to_min=0.02383
          Adam  final=(0.99622, 0.99245)  f=0.000014  dist_to_min=0.00844

Interpretation:

The punchline that matters for this project: when an optimizer “converges,” it’s worth asking what that actually means numerically. Vanilla GD reported a function value of 0.0074, which sounds tiny. But it’s 0.19 away from the true minimum in parameter space. In a portfolio optimization context, “close in objective value” and “close in weight space” can be wildly different things, and the difference matters when the weights are being traded for real money.

Plot: outputs/level_4_rosenbrock.png — Rosenbrock contour plot (log-spaced level curves) with all three descent paths overlaid.

Drill 2 — Portfolio optimization with transaction cost constraints

Script: ../scripts/level_4_portfolio_with_costs.py

Setup: Extends the L3 Markowitz optimizer with an L1 transaction cost penalty. Start from an equal-weight prior portfolio w_prev, minimize variance + λ · ||w - w_prev||_1 subject to the usual constraints (sum = 1, long/short bounds, minimum return target of 15%). Sweep λ from 0 (pure Markowitz) to 0.5 (essentially frozen).

Why L1 (not L2) turnover penalty: brokerage commissions, bid-ask spreads, and market impact for moderate sizes are approximately linear in trade size, which maps to L1 norm. L2 would penalize big trades quadratically, which overstates the cost of rebalancing illiquid positions and understates it for small ones.

Results (10-asset universe, 2022-2026 sample, target 15% return):

   lambda   turnover    return    vol    largest change
    0.000     87.1%     15.00%   12.19%  GOOGL +20.0%
    0.001     81.9%     15.00%   12.20%  GOOGL +19.4%
    0.010     51.9%     15.00%   12.81%  GOOGL +16.0%
    0.050      7.4%     16.69%   16.35%  JNJ -3.7%
    0.100      0.0%     18.27%   17.58%  MSFT +0.0%
    0.500      0.0%     18.27%   17.58%  AAPL +0.0%

Interpretation:

Why this matters operationally: Naive Markowitz says “rebalance to these new weights.” A cost-aware optimizer says “rebalance partway toward these new weights, stopping when the trade cost exceeds the variance-reduction benefit.” The latter is what a real portfolio manager does. The λ parameter is where you put your estimate of turnover cost in basis points.

Connection to the articles: this is directly the kind of “more complex constraint” mentioned in the L4 homework. The L1 penalty is also what you’d use for the ||w||_1 ≤ k cardinality relaxation (LASSO), which is the convex hull of hard cardinality constraints. We’ll see this again at PM4 when we optimize prediction-market positions with turnover-aware execution.

Plot: outputs/level_4_portfolio_with_costs.png — left panel: turnover vs λ (symlog scale). Right panel: weight bars comparing prior, λ=0 (free rebalance), and λ=0.1 (cost-aware) allocations.

Cross-drill observations

  1. Both drills are about understanding what the optimizer does when it succeeds vs when it fails. Drill 1 shows that “convergence” is a spectrum — vanilla GD, momentum GD, and Adam all “converge” in the sense of reducing f, but they end up in very different places. Drill 2 shows that the “optimal” answer depends entirely on the cost function you hand the optimizer: change the cost, get a completely different answer.

  2. Convex optimization is an abstraction layer you can trust exactly as much as you trust your constraints. cvxpy made drill 2 trivial — adding the turnover penalty was one line. But the choice of L1 vs L2 penalty, the λ value, the prior portfolio, and the return target all live outside the solver and are judgment calls. The optimizer is only as smart as the modeler.

  3. This is where the project starts to look operational rather than academic. Everything in L1-L3 was a vibe check. L4 drill 2 is an actual tool I’d run on a real portfolio tomorrow if we had a prior position and a cost estimate. The gap between “drill” and “production” is narrow here — mostly data plumbing and the shrinkage-estimator upgrade from the L3 notes.

Gate check → proceed to Level 5?

Yes. All L4 drills pass cleanly:

Next action (Level 5): stochastic calculus. This is the hardest level per the article — 6-8 weeks of study recommended. Drills: Black-Scholes from scratch, Monte Carlo convergence check, and all five Greeks. The key insight to internalize is that (dW_t)² = dt, which is why Itô’s lemma has the second-order term that ordinary calculus drops. Black-Scholes is derived from Itô’s lemma + a delta-hedging argument that cancels the dW terms, so the option price ends up independent of the stock’s drift μ. That’s the mind-bending risk-neutral pricing result.

After L5, we have the math foundation. The next track is the Prediction Markets / simulation guide: Monte Carlo binary contracts, importance sampling for tail events, particle filters for live updating, copulas for correlated portfolios, and the 5-layer production stack. That’s where this project starts to look like a real system.