Run Fair Benchmarks Every Time
Identical environments, interference-free execution, reproducible results. Compare tools, libraries, or configurations with confidence.
Why Most Benchmarks Are Unreliable
Running benchmarks on your local machine or shared servers produces inconsistent results:
- ✗Background processes compete for CPU and memory
- ✗Filesystem caches warm up after the first run
- ✗Different environments have different package versions
- ✗CPU throttling and thermal issues affect results
With Hopx Sandboxes
- ✓Fresh, isolated micro-VM for each benchmark run
- ✓Cold-start from identical snapshots every time
- ✓No noisy neighbors — dedicated resources per sandbox
- ✓Consistent CPU/memory allocation you control
What You Can Benchmark
Library Comparison
Compare pandas vs polars, requests vs httpx, PIL vs opencv
Algorithm Performance
Test different implementations of the same algorithm
Language Runtimes
Python vs Node vs Go for the same task
Database Queries
Compare query performance across databases or ORMs
ML Model Inference
Benchmark different models or optimization techniques
Configuration Tuning
Test performance impact of different settings
Why Hopx for Benchmarking
Identical Environments
Every sandbox starts from the same snapshot. Same OS, same packages, same state — guaranteed reproducibility.
Interference-Free
Dedicated micro-VM per benchmark. No noisy neighbors, no shared resources, no background processes affecting results.
Precise Metrics
Capture CPU time, memory usage, disk I/O, and execution time. Compare with confidence.
Controlled Resources
Allocate exact CPU and memory per sandbox. Test how code performs under different resource constraints.
Benchmark in Minutes, Not Hours
Spin up fresh sandboxes for each run, capture precise timing, aggregate results. No setup, no cleanup, no infrastructure management.
Fresh Environment Per Run
No cache warming, no state carryover between runs
Statistical Analysis
Run multiple iterations, get mean, median, and standard deviation
Parallel Execution
Run hundreds of benchmarks in parallel for faster results
1from hopx_ai import Sandbox
2import statistics
3
4def benchmark(code: str, runs: int = 10):
5 """Run benchmark in isolated sandbox"""
6 times = []
7
8 for i in range(runs):
9 # Fresh sandbox for each run
10 sandbox = Sandbox.create(template="code-interpreter")
11
12 result = sandbox.run_code(f"""
13import time
14start = time.perf_counter()
15
16{code}
17
18elapsed = time.perf_counter() - start
19print(f"TIME:{{elapsed}}")
20""")
21
22 # Parse execution time
23 for line in result.stdout.split('\n'):
24 if line.startswith('TIME:'):
25 times.append(float(line.split(':')[1]))
26
27 sandbox.kill()
28
29 return {
30 'mean': statistics.mean(times),
31 'median': statistics.median(times),
32 'stdev': statistics.stdev(times) if len(times) > 1 else 0,
33 'min': min(times),
34 'max': max(times)
35 }
36
37# Compare two approaches
38results_pandas = benchmark("""
39import pandas as pd
40df = pd.read_csv('/workspace/data.csv')
41result = df.groupby('category').sum()
42""")
43
44results_polars = benchmark("""
45import polars as pl
46df = pl.read_csv('/workspace/data.csv')
47result = df.group_by('category').sum()
48""")
49
50print(f"Pandas: {results_pandas['mean']:.3f}s")
51print(f"Polars: {results_polars['mean']:.3f}s")