Benchmark Comparisons

This document records comparable experiment results produced directly from EDAF v3 campaign runs.

Core Non-COCO Benchmark Set (Cross-Domain)

Reproducibility Inputs

  • Batch config: <repo-root>/configs/batch-benchmark-core-v3.yml
  • Experiment configs:
  • <repo-root>/configs/umda-onemax-v3.yml
  • <repo-root>/configs/benchmarks/knapsack-umda-v3.yml
  • <repo-root>/configs/benchmarks/maxsat-umda-v3.yml
  • <repo-root>/configs/benchmarks/tsplib-berlin52-ehm-v3.yml
  • <repo-root>/configs/benchmarks/cec2014-f10-cma-v3.yml
  • <repo-root>/configs/benchmarks/zdt1-mo-v3.yml
  • <repo-root>/configs/benchmarks/dtlz2-mo-v3.yml
  • <repo-root>/configs/benchmarks/nguyen1-tree-eda-v3.yml

Run Summary

Run ID Algorithm Model Problem Best Fitness
umda-onemax-v3 umda umda-bernoulli onemax 64.0
benchmark-knapsack-umda-v3 umda umda-bernoulli knapsack 159.0
benchmark-maxsat-umda-v3 umda umda-bernoulli maxsat 60.0
benchmark-tsplib-berlin52-ehm-v3 ehm-eda ehm tsplib-tsp 14146.0
benchmark-cec2014-f10-cma-v3 cma-es cma-es cec2014 4.115598e-4
benchmark-zdt1-mo-v3 mo-eda-skeleton gaussian-diag zdt 0.375729
benchmark-dtlz2-mo-v3 mo-eda-skeleton gaussian-diag dtlz 0.33
benchmark-nguyen1-tree-eda-v3 tree-eda token-categorical nguyen-sr 0.024960

Multiobjective Objective Snapshots

  • benchmark-zdt1-mo-v3: best_obj_0=0.2895337996, best_obj_1=0.4619246723
  • benchmark-dtlz2-mo-v3: best_obj_0≈0, best_obj_1≈0, best_obj_2=1.0

Generated Artifacts

  • DB summary CSV:
  • <repo-root>/results/benchmarks/benchmark-core-v3-summary.csv
  • MO objectives CSV:
  • <repo-root>/results/benchmarks/benchmark-mo-objectives-v3.csv
  • Per-run HTML reports:
  • <repo-root>/reports/benchmarks/report-benchmark-cec2014-f10-cma-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-knapsack-umda-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-maxsat-umda-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-tsplib-berlin52-ehm-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-zdt1-mo-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-dtlz2-mo-v3.html
  • <repo-root>/reports/benchmarks/report-benchmark-nguyen1-tree-eda-v3.html

Regeneration Commands

cd <repo-root>
./edaf batch -c configs/batch-benchmark-core-v3.yml

COCO BBOB D=2 Comparison

Reproducibility Inputs

  • Campaign config: <repo-root>/configs/coco/bbob-compare-d2-v3.yml
  • Optimizers:
  • <repo-root>/configs/coco/optimizers/gaussian-baseline-v3.yml
  • <repo-root>/configs/coco/optimizers/gaussian-aggressive-v3.yml
  • Reference CSV imported: <repo-root>/configs/coco/reference/coco-reference-template.csv
  • DB: jdbc:sqlite:edaf-v3.db
  • Campaign id: coco-bbob-compare-d2-v3

Scope

  • Functions: 1,2,3,8,15
  • Dimension: 2
  • Instances: 1,2
  • Repetitions: 3
  • Total trials: 60

Overall Comparable Metrics

Optimizer Success Rate Mean Evals to Target EDAF ERT Reference ERT ERT Ratio
gaussian-aggressive 76.67% 1276.52 2189.57 716.00 3.058
gaussian-baseline 46.67% 1877.14 5305.71 716.00 7.410

Interpretation:

  • lower ERT Ratio is better (EDAF ERT / reference ERT)
  • for this campaign, gaussian-aggressive outperformed gaussian-baseline on the aggregate target-reaching metrics

Function-Level Detail

Function Optimizer Trials Successes Avg Evals To Target Avg Best Fitness
1 gaussian-aggressive 6 6 706.67 1.36782e-19
1 gaussian-baseline 6 6 1360.00 2.10961e-18
2 gaussian-aggressive 6 6 1013.33 5.90171e-15
2 gaussian-baseline 6 6 2000.00 1.48257e-13
3 gaussian-aggressive 6 5 1776.00 0.165827
3 gaussian-baseline 6 1 3120.00 0.000826579
8 gaussian-aggressive 6 0 - 0.0646566
8 gaussian-baseline 6 0 - 0.061192
15 gaussian-aggressive 6 6 1693.33 0
15 gaussian-baseline 6 1 3000.00 0.0324831

Generated Artifacts

  • Campaign HTML report:
  • <repo-root>/reports/coco/coco-campaign-coco-bbob-compare-d2-v3.html
  • Comparable CSV exports:
  • <repo-root>/results/coco/coco-bbob-compare-d2-v3-aggregates.csv
  • <repo-root>/results/coco/coco-bbob-compare-d2-v3-by-function.csv

Regeneration Commands

cd <repo-root>
./edaf coco import-reference --csv configs/coco/reference/coco-reference-template.csv --suite bbob --db-url jdbc:sqlite:edaf-v3.db
./edaf coco run -c configs/coco/bbob-compare-d2-v3.yml
./edaf coco report --campaign-id coco-bbob-compare-d2-v3 --out reports/coco --db-url jdbc:sqlite:edaf-v3.db

COCO BBOB Publishable Comparison (D=2,5,10,20)

Reproducibility Inputs

  • Campaign config: <repo-root>/configs/coco/bbob-publishable-v3.yml
  • Optimizers:
  • <repo-root>/configs/coco/optimizers/gaussian-baseline-v3.yml
  • <repo-root>/configs/coco/optimizers/gaussian-aggressive-v3.yml
  • Reference extractor script:
  • <repo-root>/scripts/coco/build_reference_from_ppdata.py
  • Imported reference CSV:
  • <repo-root>/configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv
  • Reference source URL: https://numbbo.github.io/ppdata-archive/
  • Imported rows: 4665
  • DB: jdbc:sqlite:edaf-v3.db
  • Campaign id: coco-bbob-publishable-v3

Scope

  • Functions: 1,2,3,8,15
  • Dimensions: 2,5,10,20
  • Instances: 1,2
  • Repetitions: 5
  • Total trials: 400
  • Successful trials: 156

Aggregate Metrics by Dimension

Optimizer Dimension Success Rate Mean Evals to Target EDAF ERT Reference ERT ERT Ratio
gaussian-aggressive 2 0.72 1195.56 2128.89 205.97 10.3358
gaussian-aggressive 5 0.52 2092.31 7630.77 2076.36 3.6751
gaussian-aggressive 10 0.22 2552.73 45098.18 6583.80 6.8499
gaussian-aggressive 20 0.08 3820.00 279820.00 18923.74 14.7867
gaussian-baseline 2 0.42 1560.00 4874.29 205.97 23.6647
gaussian-baseline 5 0.46 3553.04 10596.52 2076.36 5.1034
gaussian-baseline 10 0.40 5616.00 23616.00 6583.80 3.5870
gaussian-baseline 20 0.30 7864.00 63864.00 18923.74 3.3748

Interpretation:

  • lower ERT Ratio is better (EDAF ERT / reference ERT)
  • gaussian-aggressive was stronger in lower dimensions (2, 5)
  • gaussian-baseline was stronger in higher dimensions (10, 20) for this campaign setup

Publishable Bundle

  • Bundle directory:
  • <repo-root>/reports/coco/bundles/coco-bbob-publishable-v3/
  • Campaign HTML report:
  • <repo-root>/reports/coco/coco-campaign-coco-bbob-publishable-v3.html
  • CSV exports:
  • <repo-root>/results/coco/coco-bbob-publishable-v3-aggregates.csv
  • <repo-root>/results/coco/coco-bbob-publishable-v3-by-function.csv
  • <repo-root>/results/coco/coco-bbob-publishable-v3-campaign.csv

Regeneration Commands

cd <repo-root>
./scripts/coco/build_reference_from_ppdata.py \
  --functions 1,2,3,8,15 \
  --dimensions 2,5,10,20 \
  --target-label 1e-7 \
  --target-value 1e-7 \
  --out configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv

sqlite3 edaf-v3.db "DELETE FROM coco_reference_results WHERE suite='bbob';"
./edaf coco import-reference \
  --csv configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv \
  --suite bbob \
  --source-url https://numbbo.github.io/ppdata-archive/ \
  --db-url jdbc:sqlite:edaf-v3.db

./edaf coco run -c configs/coco/bbob-publishable-v3.yml
./edaf coco report --campaign-id coco-bbob-publishable-v3 --out reports/coco --db-url jdbc:sqlite:edaf-v3.db

COCO BBOB CMA-ES Upgrade Comparison

Reproducibility Inputs

  • Campaign config: <repo-root>/configs/coco/bbob-cma-compare-v3.yml
  • Optimizer templates:
  • <repo-root>/configs/coco/optimizers/gaussian-baseline-v3.yml
  • <repo-root>/configs/coco/optimizers/gaussian-aggressive-v3.yml
  • <repo-root>/configs/coco/optimizers/cma-es-v3.yml
  • Reference CSV:
  • <repo-root>/configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv
  • Campaign id: coco-bbob-cma-compare-v3

Scope

  • Functions: 1,2,3,8,15
  • Dimensions: 2,5,10,20
  • Instances: 1,2
  • Repetitions: 3
  • Total trials: 360
  • Successful trials: 155

Aggregate ERT Ratio by Dimension

Optimizer D2 D5 D10 D20
cma-es 7.6241 3.5142 3.5066 5.5842
gaussian-aggressive 9.3047 3.8323 7.6693 17.9626
gaussian-baseline 24.8576 5.4593 4.4017 3.3679

Interpretation:

  • lower is better (EDAF ERT / reference ERT)
  • upgraded cma-es is strongest on D2, D5, and D10
  • gaussian-baseline remains strongest on D20 in this setup

Overall Success Rates

Optimizer Trials Successes Success Rate
cma-es 120 65 0.541667
gaussian-aggressive 120 46 0.383333
gaussian-baseline 120 44 0.366667

Generated Artifacts

  • Campaign HTML report:
  • <repo-root>/reports/coco/coco-campaign-coco-bbob-cma-compare-v3.html
  • CSV exports:
  • <repo-root>/results/coco/coco-bbob-cma-compare-v3-aggregates.csv
  • <repo-root>/results/coco/coco-bbob-cma-compare-v3-overall.csv
  • <repo-root>/results/coco/coco-bbob-cma-compare-v3-by-function.csv
  • Publishable bundle:
  • <repo-root>/reports/coco/bundles/coco-bbob-cma-compare-v3/
  • <repo-root>/reports/coco/bundles/coco-bbob-cma-compare-v3.tar.gz

COCO BBOB Publishable Comparison v4 (CMA + Restart + Larger Repetitions)

Reproducibility Inputs

  • Campaign config: <repo-root>/configs/coco/bbob-publishable-v4.yml
  • Optimizer templates:
  • <repo-root>/configs/coco/optimizers/gaussian-baseline-v3.yml
  • <repo-root>/configs/coco/optimizers/gaussian-aggressive-v3.yml
  • <repo-root>/configs/coco/optimizers/cma-es-v3.yml
  • <repo-root>/configs/coco/optimizers/cma-es-restart-v3.yml
  • Reference CSV:
  • <repo-root>/configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv
  • Campaign id: coco-bbob-publishable-v4

Scope

  • Functions: 1,2,3,8,15
  • Dimensions: 2,5,10,20
  • Instances: 1,2
  • Repetitions: 6
  • Total trials: 960
  • Successful trials: 459

Aggregate ERT Ratio by Dimension

Optimizer D2 D5 D10 D20
cma-es 5.3650 2.8518 2.1290 2.0031
cma-es-restart 7.6135 3.6076 2.7038 2.4021
gaussian-aggressive 11.0836 4.4594 6.3697 24.3039
gaussian-baseline 22.8558 4.7082 3.6400 3.6082

Interpretation:

  • lower is better (EDAF ERT / reference ERT)
  • cma-es is strongest across all tested dimensions in this campaign
  • cma-es-restart improves evals-to-target on several slices but underperforms baseline cma-es in aggregate ERT ratio due reduced success rate

Overall Success Rates

Optimizer Trials Successes Success Rate
cma-es 240 150 62.50%
cma-es-restart 240 127 52.92%
gaussian-baseline 240 97 40.42%
gaussian-aggressive 240 85 35.42%

Generated Artifacts

  • Campaign HTML report:
  • <repo-root>/reports/coco/coco-campaign-coco-bbob-publishable-v4.html
  • CSV exports:
  • <repo-root>/results/coco/coco-bbob-publishable-v4-aggregates.csv
  • <repo-root>/results/coco/coco-bbob-publishable-v4-overall.csv
  • <repo-root>/results/coco/coco-bbob-publishable-v4-by-function.csv
  • Publishable bundle:
  • <repo-root>/reports/coco/bundles/coco-bbob-publishable-v4/
  • <repo-root>/reports/coco/bundles/coco-bbob-publishable-v4.tar.gz

Regeneration Commands

cd <repo-root>

./scripts/coco/build_reference_from_ppdata.py \
  --functions 1,2,3,8,15 \
  --dimensions 2,5,10,20 \
  --target-label 1e-7 \
  --target-value 1e-7 \
  --out configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv

./edaf coco import-reference \
  --csv configs/coco/reference/coco-reference-bbob-ppdata-2009-2023-f1-2-3-8-15-d2-5-10-20-t1e-7.csv \
  --suite bbob \
  --source-url https://numbbo.github.io/ppdata-archive/bbob/ \
  --db-url jdbc:sqlite:edaf-v3.db

./edaf coco run -c configs/coco/bbob-publishable-v4.yml
./edaf coco report --campaign-id coco-bbob-publishable-v4 --out reports/coco --db-url jdbc:sqlite:edaf-v3.db

Visual Summary

flowchart LR
    A["EDAF"] --> B["benchmark comparisons"]
    B --> C["Configure"]
    B --> D["Execute"]
    B --> E["Inspect"]
    E --> F["Iterate"]

Estimation of Distribution Algorithms Framework
Copyright (c) 2026 Dr. Karlo Knezevic
Licensed under the Apache License, Version 2.0.