Data Processing Without the Infrastructure
ETL pipelines, transformations, analytics — with pandas, numpy, and ML libraries preinstalled. Upload data, process it, download results. No servers to manage.
Data Processing Shouldn't Require DevOps
You just want to process data. But traditional approaches come with overhead:
- ✗Spinning up Spark clusters for simple transformations
- ✗Managing Airflow DAGs and dependencies
- ✗Installing libraries in production environments
- ✗Local machine running out of memory on large files
With Hopx
- ✓Sandboxes with data science libraries pre-installed
- ✓Upload files, run code, download results — that's it
- ✓Scale up memory for large datasets on demand
- ✓Run jobs in parallel for faster processing
Common Data Processing Tasks
ETL Pipelines
Extract, transform, load data between systems
Data Cleaning
Handle missing values, normalize, deduplicate
Feature Engineering
Create features for ML models
Aggregations
Group by, pivot, window functions at scale
Format Conversion
CSV to Parquet, JSON to SQL, any format
Report Generation
Generate charts, tables, PDF reports
Pre-installed & Ready to Use
No waiting for pip install. These libraries are ready from the moment your sandbox starts.
Why Hopx for Data Processing
Libraries Pre-installed
pandas, numpy, scipy, matplotlib, scikit-learn ready to use. No pip install waiting — just start processing.
Full Filesystem Access
Upload datasets, process them, download results. Work with files just like on your local machine.
Rich Output Capture
Capture matplotlib plots, pandas DataFrames, and plotly charts. Perfect for data exploration and reporting.
Scale Resources
Need more RAM for large datasets? Spin up sandboxes with the exact CPU and memory you need.
Simple Workflow for Complex Data
Upload your data, run pandas/numpy code, download results. Works with CSVs, Parquet, Excel, JSON — any format pandas supports.
Instant Start
~100ms cold start, libraries already loaded
File I/O
Upload inputs, download outputs — full filesystem access
Rich Outputs
Capture charts, tables, and visualizations
1from hopx_ai import Sandbox
2
3# Create sandbox with data science template
4sandbox = Sandbox.create(template="code-interpreter")
5
6# Upload your dataset
7sandbox.files.write(
8 "/workspace/sales.csv",
9 open("local_sales.csv").read()
10)
11
12# Process data with pandas
13result = sandbox.run_code("""
14import pandas as pd
15import matplotlib.pyplot as plt
16
17# Load and clean data
18df = pd.read_csv('/workspace/sales.csv')
19df['date'] = pd.to_datetime(df['date'])
20df = df.dropna()
21
22# Aggregate by month
23monthly = df.groupby(df['date'].dt.to_period('M')).agg({
24 'revenue': 'sum',
25 'orders': 'count'
26}).reset_index()
27
28monthly['date'] = monthly['date'].astype(str)
29
30# Create visualization
31fig, ax = plt.subplots(figsize=(12, 6))
32ax.bar(monthly['date'], monthly['revenue'])
33ax.set_title('Monthly Revenue')
34plt.xticks(rotation=45)
35plt.tight_layout()
36plt.savefig('/workspace/revenue_chart.png')
37
38# Save processed data
39monthly.to_csv('/workspace/monthly_summary.csv', index=False)
40print(monthly.to_string())
41""")
42
43print(result.stdout)
44
45# Download results
46chart = sandbox.files.read("/workspace/revenue_chart.png")
47summary = sandbox.files.read("/workspace/monthly_summary.csv")
48
49sandbox.kill()