Home/Use Cases/Data Processing

Data Processing Without the Infrastructure

ETL pipelines, transformations, analytics — with pandas, numpy, and ML libraries preinstalled. Upload data, process it, download results. No servers to manage.

No credit card requiredpandas, numpy pre-installed~100ms startup

Data Processing Shouldn't Require DevOps

You just want to process data. But traditional approaches come with overhead:

  • Spinning up Spark clusters for simple transformations
  • Managing Airflow DAGs and dependencies
  • Installing libraries in production environments
  • Local machine running out of memory on large files

With Hopx

  • Sandboxes with data science libraries pre-installed
  • Upload files, run code, download results — that's it
  • Scale up memory for large datasets on demand
  • Run jobs in parallel for faster processing

Common Data Processing Tasks

ETL Pipelines

Extract, transform, load data between systems

Data Cleaning

Handle missing values, normalize, deduplicate

Feature Engineering

Create features for ML models

Aggregations

Group by, pivot, window functions at scale

Format Conversion

CSV to Parquet, JSON to SQL, any format

Report Generation

Generate charts, tables, PDF reports

Pre-installed & Ready to Use

No waiting for pip install. These libraries are ready from the moment your sandbox starts.

pandasnumpyscipymatplotlibseabornplotlyscikit-learnpolarspyarrowopenpyxlrequestsbeautifulsoup4

Why Hopx for Data Processing

Libraries Pre-installed

pandas, numpy, scipy, matplotlib, scikit-learn ready to use. No pip install waiting — just start processing.

Full Filesystem Access

Upload datasets, process them, download results. Work with files just like on your local machine.

Rich Output Capture

Capture matplotlib plots, pandas DataFrames, and plotly charts. Perfect for data exploration and reporting.

Scale Resources

Need more RAM for large datasets? Spin up sandboxes with the exact CPU and memory you need.

Simple Workflow for Complex Data

Upload your data, run pandas/numpy code, download results. Works with CSVs, Parquet, Excel, JSON — any format pandas supports.

Instant Start

~100ms cold start, libraries already loaded

File I/O

Upload inputs, download outputs — full filesystem access

Rich Outputs

Capture charts, tables, and visualizations

process_data.py
1from hopx_ai import Sandbox
2
3# Create sandbox with data science template
4sandbox = Sandbox.create(template="code-interpreter")
5
6# Upload your dataset
7sandbox.files.write(
8    "/workspace/sales.csv",
9    open("local_sales.csv").read()
10)
11
12# Process data with pandas
13result = sandbox.run_code("""
14import pandas as pd
15import matplotlib.pyplot as plt
16
17# Load and clean data
18df = pd.read_csv('/workspace/sales.csv')
19df['date'] = pd.to_datetime(df['date'])
20df = df.dropna()
21
22# Aggregate by month
23monthly = df.groupby(df['date'].dt.to_period('M')).agg({
24    'revenue': 'sum',
25    'orders': 'count'
26}).reset_index()
27
28monthly['date'] = monthly['date'].astype(str)
29
30# Create visualization
31fig, ax = plt.subplots(figsize=(12, 6))
32ax.bar(monthly['date'], monthly['revenue'])
33ax.set_title('Monthly Revenue')
34plt.xticks(rotation=45)
35plt.tight_layout()
36plt.savefig('/workspace/revenue_chart.png')
37
38# Save processed data
39monthly.to_csv('/workspace/monthly_summary.csv', index=False)
40print(monthly.to_string())
41""")
42
43print(result.stdout)
44
45# Download results
46chart = sandbox.files.read("/workspace/revenue_chart.png")
47summary = sandbox.files.read("/workspace/monthly_summary.csv")
48
49sandbox.kill()

Start Processing Data in Minutes

Get $200 in free credits. No infrastructure setup, no DevOps required. Just upload, process, download.