Rechenberg's (1+1)-Evolution Strategy

One-parent, one-offspring per generation, with the 1/5-success-rule step adaptation

Rechenberg's (1+1)-ES is the canonical Evolution Strategy from Rechenberg (1973). A single parent proposes one Gaussian-perturbed offspring per generation; the offspring is accepted if it beats the parent. The step size σ is adapted by the 1/5-success-rule: grow σ by 1.5× when more than 1/5 of recent trials improved, shrink by 1.5⁻¹ otherwise. Was formerly named AdaptiveRandomSearch in HumpDay; the new name matches the canonical literature.

Interactive 3D Visualization

See Rechenberg (1+1)-ES in action on 3D optimization surfaces:

Loading 3D visualization...

Requires WebGL support

Instructions: Choose a test function and algorithm, then click Start to watch the step-by-step optimization process.

Implementation Details

Component Details Links
Algorithm Concept Enhanced Random Search
Random sampling with adaptive parameter adjustment
Learning from optimization history
Developed: Various researchers, 1970s-80s
ARS Survey
Reference Implementation Research Literature Implementations
Various academic implementations and variants
No single standard package due to adaptation diversity
Reference: Optimization literature and custom implementations
GitHub Examples
Humpday Python Implementation HumpDay Rechenberg
Custom implementation with success-rate adaptation
Dynamic step size and region focusing
File: humpday/optimizers/search_algorithms.py
Source
Humpday JavaScript Browser Implementation
Pure JavaScript adaptive random search
Success tracking and parameter adaptation
Class: Rechenberg
JS Implementation

Performance characteristics

  • Best for: Smooth basins where the 1/5 rule keeps σ near the right exploration scale.
  • Worst for: Multimodal landscapes (Ackley). Rechenberg has no escape from a local basin; once trapped, it refines into the wrong optimum.
  • Step distribution: componentwise Gaussian. Each component of the offspring is drawn from N(0, σ2), so the step magnitude varies stochastically (mean ≈ σ·√n with heavier tails). The previous HumpDay variant projected to a unit direction, which damped the stochastic spread; the current implementation matches the canonical (1+1)-ES.
  • Dimensions: Inexpensive per iteration; reasonable into the dozens.
  • Relative to a canonical (1+1)-ES 1/5-rule reference at n_trials=200, n_dim=2: algorithmically identical to the reference. Per-cell ratios drift with the seed (both implementations trap on Ackley about half the time at 4-seed samples). See SOTA status for the live numbers.

Educational Resources

Adaptation Mechanisms

  • Step Size Adaptation: σ(t+1) = σ(t) × factor based on success rate
  • Success Rate Tracking: Monitor fraction of improvements over recent history
  • Region Focusing: Bias sampling toward successful regions
  • Direction Learning: Learn promising search directions
  • Population Size: Adapt number of samples per iteration
  • Distribution Shape: Modify sampling distribution parameters

Mathematical Foundation

  • Success Rate: p_s(t) = (# improvements) / (# recent samples)
  • Step Size Update: σ(t+1) = σ(t) × exp(α × (p_s(t) - p_target))
  • Sampling: x(t+1) ~ N(x_best, σ(t)²I) or other adaptive distributions
  • Region Weight: w_i ∝ exp(β × improvement_i) for region focusing

️ Key Parameters

  • Initial Step Size: σ₀ typically 0.1-0.5 of search range
  • Target Success Rate: p_target usually 0.1-0.3
  • Adaptation Rate: α controls speed of step size changes
  • Memory Window: How many recent samples to consider
  • Min/Max Step Size: Bounds to prevent degenerate behavior

ARS Algorithm Pseudocode

function adaptiveRandomSearch(f, x0, maxEvals, targetSuccessRate):
    x_best = x0
    f_best = f(x0)
    stepSize = 0.3  // Initial step size
    successCount = 0
    windowSize = 50
    evaluations = []

    for eval in 1 to maxEvals:
        // Generate candidate with current step size
        x_candidate = x_best + stepSize * randomGaussian(n)
        x_candidate = clip(x_candidate, [0,1])  // Bounds
        f_candidate = f(x_candidate)

        // Track success/failure
        improved = (f_candidate < f_best)
        evaluations.append(improved)

        if improved:
            x_best = x_candidate
            f_best = f_candidate
            successCount += 1

        // Adapt step size every windowSize evaluations
        if eval % windowSize == 0:
            successRate = successCount / windowSize

            if successRate > targetSuccessRate:
                stepSize *= 1.2  // Increase exploration
            else:
                stepSize *= 0.8  // Focus search

            stepSize = clip(stepSize, [0.01, 1.0])
            successCount = 0

    return x_best, f_best
                    

ARS Variants

  • 1/5-th Rule ARS: Adapt step size based on success rate ≈ 1/5
  • Multi-Point ARS: Generate multiple candidates per iteration
  • Directional ARS: Learn and exploit successful search directions
  • Region-Based ARS: Partition search space and adapt per region
  • Covariance ARS: Adapt full covariance matrix like CMA-ES
  • Hybrid ARS: Combine with local search or other methods

Adaptation Strategies

  • Success-Based: Adapt based on improvement frequency
  • Progress-Based: Adapt based on magnitude of improvements
  • Diversity-Based: Maintain population diversity
  • Distance-Based: Adapt based on distance between improvements
  • Gradient-Free: Estimate search direction without gradients

Applications

  • Hyperparameter Optimization: Machine learning model tuning
  • Black-Box Optimization: When gradient information unavailable
  • Noisy Functions: Robust to evaluation noise
  • Expensive Functions: Efficient use of limited evaluations
  • Multi-Modal Functions: Balance exploration and exploitation
  • Simulation Optimization: Parameter tuning for simulations

Advantages of ARS

  • Simplicity: Easy to implement and understand
  • Robustness: Inherits random search reliability
  • Efficiency: Better than pure random search
  • Parameter-Light: Few hyperparameters to tune
  • Parallelizable: Easy to distribute evaluations
  • No Assumptions: Works on any objective function

️ Limitations

  • Slower than Specialized Methods: Less efficient than problem-specific algorithms
  • Parameter Sensitivity: Performance depends on adaptation parameters
  • High Dimensions: Still suffers from curse of dimensionality
  • Local Structure: Doesn't exploit smooth function structure
  • Memory Requirements: Needs to track success history

Relationship to Other Methods

  • vs Random Search: Adaptive step size and region focusing
  • vs Evolution Strategies: Simpler adaptation, no population
  • vs Simulated Annealing: No temperature schedule, success-based adaptation
  • vs CMA-ES: Simpler covariance adaptation
  • vs Bayesian Optimization: No surrogate model, direct adaptation