Algorithm Complexity and Performance
This document gives a practical complexity map for EDAF algorithms and a reproducible runtime snapshot.
1) Scope and methodology
- Complexity is expressed for one iteration unless noted otherwise.
- Symbols:
N: population sizen: genotype dimension (bits, real dims, permutation length)m: selected (elite) sample size (m = N * selectionRatio)k: mixture components (GMM)E: evaluations per runI: iterations per run- Runtime measurements below were produced with:
./edaf run -c <config> --verbosity quiet- wall-clock captured via
/usr/bin/time -p - one run per config, same machine/session
flowchart TD
A["Iteration t"] --> B["Selection O(N log N) + O(N)"]
B --> C["Model Fit O(model-specific)"]
C --> D["Sampling O(N * sample-cost)"]
D --> E["Fitness O(N * eval-cost)"]
E --> F["Replacement/Niching/Restart O(N log N)"]
F --> G["Telemetry + Persistence O(metrics + sinks)"]
2) Cost decomposition inside EDAF
AbstractEdaAlgorithm<G> executes a fixed template. In most workloads, total runtime per iteration is:
T_iter ~= T_select + T_model_fit + T_sample + T_eval + T_replace + T_telemetry
Where:
T_evaldominates when fitness function is expensive.T_model_fitdominates on dependency-heavy models (BOA/hBOA/full-covariance).T_telemetrycan dominate if many sinks are synchronous; EDAF mitigates this withAsyncEventSink.
sequenceDiagram
participant Runner as ExperimentRunner
participant Algo as AbstractEdaAlgorithm
participant Model as Model<G>
participant Prob as Problem<G>
participant Sink as Event Sinks
Runner->>Algo: iterate(context)
Algo->>Algo: select(population)
Algo->>Model: fit(selected)
Algo->>Model: sample(N)
Algo->>Prob: evaluate(batch) [parallel lanes]
Algo->>Algo: replacement + niching + restart
Algo->>Sink: IterationCompletedEvent
3) Theoretical complexity by algorithm family
3.1 Binary/discrete
| Algorithm | Model class | Time per iteration (typical) | Space | Notes |
|---|---|---|---|---|
| UMDA / UMDAd | Bernoulli factorized | O(N*n + m*n) |
O(n) |
fastest baseline, no dependencies |
| PBIL | Frequency vector | O(N*n + m*n) |
O(n) |
similar to UMDA; extra smoothing cost is linear |
| cGA | Compact frequency | O(N*n) |
O(n) |
minimal memory footprint |
| BMDA | pairwise approximation | O(N*n + m*n^2) (implementation-dependent) |
O(n^2) |
dependency modeling overhead |
| BOA / EBNA | Bayesian network | O(m*n^2) to O(m*n^3) |
O(n^2) to O(n^3) |
structure learning is expensive |
| hBOA | hierarchical BN | O(m*n^2) to O(m*n^3) |
O(n^2)+ |
strongest dependency modeling in discrete set |
| MIMIC / Chow-Liu variants | tree dependencies | O(m*n^2) |
O(n^2) |
good compromise speed vs dependency capture |
3.2 Continuous
| Algorithm | Model | Time per iteration (typical) | Space | Notes |
|---|---|---|---|---|
| UMDAc | diagonal Gaussian | O(m*n + N*n) |
O(n) |
very scalable |
| Gaussian EDA (diag) | diagonal Gaussian | O(m*n + N*n) |
O(n) |
robust baseline |
| EMNA / EGNA alias | full covariance Gaussian | O(m*n^2 + n^3 + N*n^2) |
O(n^2) |
covariance factorization dominates |
| GMM-EDA / SPEDA alias | mixture model | O(k*m*n*emIter + N*k*n) |
O(k*n) to O(k*n^2) |
sensitive to k and EM loops |
| KDE-EDA / UnivariateKEDA | KDE | O(m*n + N*m*n) (naive) |
O(m*n) |
expensive sampling in high m |
| CMA-ES / xNES / sNES | strategy models | O(n^2) to O(n^3) |
O(n^2) |
stateful second-order adaptation |
3.3 Permutation
| Algorithm | Model | Time per iteration (typical) | Space | Notes |
|---|---|---|---|---|
| EHM-EDA | edge histogram | O(m*n + N*n) |
O(n^2) |
strong TSP baseline |
| EHBSA | edge histogram sampling | O(m*n + N*n) |
O(n^2) |
similar complexity, different sampler |
| Mallows EDA | consensus + dispersion | O(m*n log n + N*n log n) |
O(n) to O(n^2) |
robust ranking model |
| Plackett-Luce EDA | sequential weights | O(m*n + N*n^2) |
O(n) |
sampling can be quadratic |
classDiagram
class AbstractEdaAlgorithm {
+initialize()
+iterate()
+evaluateFitnessBatch()
+complete()
}
class Model {
+fit(selected)
+sample(count)
+diagnostics()
}
class Problem {
+evaluate(genotype)
}
AbstractEdaAlgorithm --> Model : fit/sample
AbstractEdaAlgorithm --> Problem : evaluate
4) Reproducible runtime snapshot (one representative problem per representation)
Configs used:
- Binary (
OneMax):configs/umda-onemax-v3.yml,configs/benchmarks/onemax-pbil-v3.yml,configs/benchmarks/onemax-hboa-v3.yml - Continuous (
Sphere):configs/gaussian-sphere-v3.yml,configs/literature-eda/umdac-sphere-v3.yml,configs/literature-eda/emna-sphere-v3.yml - Permutation (
small-tsp):configs/ehm-tsp-v3.yml,configs/benchmarks/small-tsp-ehbsa-v3.yml,configs/latent-insights/permutation-smalltsp-mallows-latent.yml
Measured run summary:
| Family | Problem | Algorithm | Best fitness | Wall time real (s) |
|---|---|---|---|---|
| binary | onemax (64) | UMDA | 64.0 |
29.88 |
| binary | onemax (64) | PBIL | 64.0 |
6.98 |
| binary | onemax (64) | hBOA | 64.0 |
6.51 |
| real | sphere (20D) | gaussian-eda | 5.09e-16 |
6.35 |
| real | sphere (20D) | umdac | 4.52e-16 |
7.27 |
| real | sphere (20D) | emna | 6.33e-7 |
6.49 |
| permutation | small-tsp | ehm-eda | 33.159 |
5.57 |
| permutation | small-tsp | ehbsa | 33.159 |
7.00 |
| permutation | small-tsp | mallows-eda | 35.406 |
6.73 |
Important interpretation notes:
- These are single-run snapshots; use batch campaigns (e.g., 30 runs) for publication.
- Runtime can shift significantly with sink configuration (
db,jsonl,csv) and telemetry frequency. - For expensive fitness functions, algorithm overhead differences are smaller relative to evaluation cost.
stateDiagram-v2
[*] --> Prototype
Prototype --> Baseline: choose UMDA/gaussian-eda/ehm-eda
Baseline --> DependencyModel: if interactions are strong
DependencyModel --> HighOrder: BOA/hBOA/full-covariance
DependencyModel --> TreeModel: Chow-Liu/MIMIC/EHM
Baseline --> StrategyModels: if smooth continuous & second-order gains expected
StrategyModels --> CMA_NES: CMA-ES/xNES/sNES
HighOrder --> [*]
TreeModel --> [*]
CMA_NES --> [*]
5) Practical selection guide (when each class is worth it)
| Scenario | Prefer | Why |
|---|---|---|
| Fast baseline / high-dimensional binary | umda, pbil, cga |
linear memory and fit cost |
| Binary with strong dependencies | chow-liu-eda, boa, hboa |
captures interactions absent in UMDA |
| Smooth continuous objective, moderate dim | gaussian-eda, umdac |
stable and cheap |
| Strongly correlated continuous dimensions | full-covariance-eda, emna, egna |
models covariance explicitly |
| Multimodal continuous objectives | gmm-eda, speda, kde-eda |
multimodal density support |
| TSP/permutation routing | ehm-eda, ehbsa |
edge-frequency modeling aligns with route structure |
6) Scaling advice and bottleneck controls
- Use batch-level parallelism + in-run evaluation parallelism together (EDAF does both).
- Keep sink modes minimal for speed-critical sweeps (
console+csvis cheapest). - For SQLite-heavy runs, reduce event frequency or switch to PostgreSQL for higher write concurrency.
- Use
observability.metricsEveryIterations> 1 in very long runs to cut telemetry overhead.
7) Reproduce this document's measurements
cd <repo-root>
mkdir -p results/release-bench
/usr/bin/time -p ./edaf run -c configs/umda-onemax-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/benchmarks/onemax-pbil-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/benchmarks/onemax-hboa-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/gaussian-sphere-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/literature-eda/umdac-sphere-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/literature-eda/emna-sphere-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/ehm-tsp-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/benchmarks/small-tsp-ehbsa-v3.yml --verbosity quiet
/usr/bin/time -p ./edaf run -c configs/latent-insights/permutation-smalltsp-mallows-latent.yml --verbosity quiet
Estimation of Distribution Algorithms Framework
Copyright (c) 2026 Dr. Karlo Knezevic
Licensed under the Apache License, Version 2.0.