Home/Use Cases/Benchmark Runtimes

Run Fair Benchmarks Every Time

Identical environments, interference-free execution, reproducible results. Compare tools, libraries, or configurations with confidence.

No credit card required~100ms sandbox startupReproducible environments

Why Most Benchmarks Are Unreliable

Running benchmarks on your local machine or shared servers produces inconsistent results:

✗Background processes compete for CPU and memory
✗Filesystem caches warm up after the first run
✗Different environments have different package versions
✗CPU throttling and thermal issues affect results

With Hopx Sandboxes

✓Fresh, isolated micro-VM for each benchmark run
✓Cold-start from identical snapshots every time
✓No noisy neighbors — dedicated resources per sandbox
✓Consistent CPU/memory allocation you control

What You Can Benchmark

Library Comparison

Compare pandas vs polars, requests vs httpx, PIL vs opencv

Algorithm Performance

Test different implementations of the same algorithm

Language Runtimes

Python vs Node vs Go for the same task

Database Queries

Compare query performance across databases or ORMs

ML Model Inference

Benchmark different models or optimization techniques

Configuration Tuning

Test performance impact of different settings

Why Hopx for Benchmarking

Identical Environments

Every sandbox starts from the same snapshot. Same OS, same packages, same state — guaranteed reproducibility.

Interference-Free

Dedicated micro-VM per benchmark. No noisy neighbors, no shared resources, no background processes affecting results.

Precise Metrics

Capture CPU time, memory usage, disk I/O, and execution time. Compare with confidence.

Controlled Resources

Allocate exact CPU and memory per sandbox. Test how code performs under different resource constraints.

Benchmark in Minutes, Not Hours

Spin up fresh sandboxes for each run, capture precise timing, aggregate results. No setup, no cleanup, no infrastructure management.

Fresh Environment Per Run

No cache warming, no state carryover between runs

Statistical Analysis

Run multiple iterations, get mean, median, and standard deviation

Parallel Execution

Run hundreds of benchmarks in parallel for faster results

benchmark.py

1from hopx_ai import Sandbox
2import statistics
3
4def benchmark(code: str, runs: int = 10):
5    """Run benchmark in isolated sandbox"""
6    times = []
7    
8    for i in range(runs):
9        # Fresh sandbox for each run
10        sandbox = Sandbox.create(template="code-interpreter")
11        
12        result = sandbox.run_code(f"""
13import time
14start = time.perf_counter()
15
16{code}
17
18elapsed = time.perf_counter() - start
19print(f"TIME:{{elapsed}}")
20""")
21        
22        # Parse execution time
23        for line in result.stdout.split('\n'):
24            if line.startswith('TIME:'):
25                times.append(float(line.split(':')[1]))
26        
27        sandbox.kill()
28    
29    return {
30        'mean': statistics.mean(times),
31        'median': statistics.median(times),
32        'stdev': statistics.stdev(times) if len(times) > 1 else 0,
33        'min': min(times),
34        'max': max(times)
35    }
36
37# Compare two approaches
38results_pandas = benchmark("""
39import pandas as pd
40df = pd.read_csv('/workspace/data.csv')
41result = df.groupby('category').sum()
42""")
43
44results_polars = benchmark("""
45import polars as pl
46df = pl.read_csv('/workspace/data.csv')
47result = df.group_by('category').sum()
48""")
49
50print(f"Pandas: {results_pandas['mean']:.3f}s")
51print(f"Polars: {results_polars['mean']:.3f}s")

Get Reliable Benchmark Results

Stop guessing, start measuring. Get $200 in free credits and run your first benchmark in minutes.