Coordinate Descent

Greedy expansion per axis, with restart on multimodal landscapes

Coordinate Descent minimises a black-box objective by sweeping one coordinate at a time. Each sweep tries a step of size step in the ± direction on each axis; on the first improving move, the algorithm greedily expands — it keeps stepping in the same direction with the same step size while the objective keeps improving. After a full sweep with no improvement the step is halved.

HumpDay's coordinate descent at a glance

  • Greedy expansion per axis: on first improvement at a coordinate, keep stepping in that direction with the same step until f stops improving.
  • Step halving: when a full coordinate sweep finds no improvement, halve the step.
  • Restart on multimodal landscapes: when step ≤ 1e-6 with f > 1e-8 the run is presumed trapped in a local basin and reinitialises from a random point. On smooth basins (sphere, Rosenbrock) the run exits naturally before the restart fires.
  • Initial step 0.1, floor 1e-12, on the unit cube [0, 1]n.

Interactive 3D Visualization

See Coordinate Descent in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Implementation Details

Component Details Links
Algorithm Foundation Coordinate descent (cyclic)
One of the oldest derivative-free methods; predates the optimization literature.
HumpDay's variant uses Brent-style greedy expansion per axis (no inner line-search subroutine) and step halving on failed sweeps.
Reference: Wright (2015), "Coordinate descent algorithms", Math. Programming 151(1)
Wikipedia
Reference Implementation Custom Implementation
No single standard package (widely implemented)
Various coordinate selection strategies
Self-reference: Algorithm-specific variants
Boyd et al.
HumpDay Python HumpDay CoordinateDescent
Cyclic sweeps, greedy expansion on the first improving axis step, halve step on failed sweeps, restart trigger on multimodal trapping.
Pure Python; no required dependencies.
File: humpday/optimizers/search_algorithms.py
Source
HumpDay JavaScript Browser CoordinateDescent
Same algorithm as the Python port (greedy expansion + restart). Used in the contest and the in-browser visualizer.
Class: CoordinateDescent
JS Implementation

Performance characteristics

  • Best for: Smooth, near-axis-aligned objectives where greedy expansion gets traction.
  • Worst for: Curved valleys and highly multimodal landscapes — coordinate moves zigzag down a curved valley (Rosenbrock) and trap easily in Ackley-style basins.
  • Dimensions: Each sweep is O(n_dim) function evaluations; the algorithm scales cheaply with dimension but precision suffers as the basin shape becomes less axis-aligned.
  • Convergence: Geometric reduction of the step size; a single run halts when step falls to the floor (or the restart trigger fires).
  • Relative to a textbook greedy coordinate descent reference at n_trials=200, n_dim=2: matches on Rosenbrock; the restart trigger leaves a precision gap on sphere and Ackley (see SOTA status for the live numbers).

Educational Resources

Algorithm Variants

  • Cyclic Coordinate Descent: Fixed order through coordinates
  • Random Coordinate Descent: Random coordinate selection
  • Gauss-Southwell: Choose coordinate with largest gradient component
  • Block Coordinate Descent: Update blocks of coordinates simultaneously
  • Accelerated Coordinate Descent: Momentum-based improvements

Mathematical Foundation

  • Update Rule: x_i^(k+1) = arg min_z f(x_1^(k+1), ..., x_{i-1}^(k+1), z, x_{i+1}^k, ..., x_n^k)
  • Separability: f(x) = Σᵢ fᵢ(xᵢ) + interaction terms
  • Convergence Rate: O(1/k) for convex, linear for strongly convex
  • Line Search: Exact or Armijo backtracking along coordinate

️ Coordinate Selection Strategies

  • Cyclic (Round-Robin): i = (k mod n) + 1
  • Random: i ~ Uniform{1, 2, ..., n}
  • Gauss-Southwell: i = arg max_j |∂f/∂x_j|
  • Lipschitz-Based: Probability ∝ Lipschitz constants
  • Greedy: Choose coordinate with maximum expected decrease

Algorithm Pseudocode

function coordinateDescent(f, x0, maxIter, tolerance):
    x = x0.copy()
    n = length(x)

    for k in 1 to maxIter:
        x_old = x.copy()

        for i in 1 to n:  // Cyclic coordinate selection
            // Optimize f along coordinate i
            x[i] = lineSearch(lambda t: f(..., x[i-1], t, x[i+1], ...))

        if ||x - x_old|| < tolerance:
            break

    return x
                    

When Coordinate Descent Excels

  • Separable Functions: f(x) = Σᵢ fᵢ(xᵢ) with minimal coupling
  • Sparse Problems: Most coordinates have little impact
  • Large Scale: Memory efficient, good for high dimensions
  • Non-smooth Functions: Handles piecewise linear objectives
  • Constrained Problems: Easy to handle box constraints

️ Limitations

  • Coupling: Slow on strongly coupled variables
  • Conditioning: Poor performance on ill-conditioned problems
  • Non-Convex: Can get trapped in local minima
  • Step Size: May require careful line search tuning
  • Coordinate System: Performance depends on variable scaling

Applications

  • Lasso Regression: L1-regularized least squares
  • SVM Training: Sequential minimal optimization (SMO)
  • Matrix Factorization: Alternating least squares
  • Deep Learning: Per-layer parameter updates
  • Game Theory: Best response dynamics
  • Signal Processing: Sparse coding and denoising

Modern Extensions

  • Proximal Coordinate Descent: Handle non-smooth regularizers
  • Stochastic Coordinate Descent: Random sampling with convergence guarantees
  • Parallel Coordinate Descent: Update multiple coordinates simultaneously
  • Adaptive Methods: Automatically adjust step sizes
  • Accelerated Variants: Nesterov-style momentum acceleration