Skip to content

Quickstart

Unibox gives you one API for local files, S3, and Hugging Face datasets. This page is a 5-minute tour.

Tip

You can copy-paste these snippets as-is. Replace paths, buckets, and repo IDs with your own.

Install

pip install unibox

Load and save in 3 tabs

import unibox as ub

# Load a local file
df = ub.loads("data/sample.parquet")
print(df.head(3))

# Save to a new local file
ub.saves(df, "data/processed.parquet")
import unibox as ub

# Load a file from S3
df = ub.loads("s3://my-bucket/data/sample.csv")

# Save back to S3
ub.saves(df, "s3://my-bucket/data/processed.parquet")

Note

You need AWS credentials set up. See the Credentials page.

import unibox as ub

# Load a dataset
ds = ub.loads("hf://my-org/my-dataset")

# Save a DataFrame or Dataset to HF
ub.saves(ds, "hf://my-org/my-new-dataset")

Note

Hugging Face dataset URIs use hf://owner/repo.

Peek and list

import unibox as ub

# Peek into a dataset or DataFrame
ub.peeks(ds)

# List files in a folder or bucket prefix
files = ub.ls("s3://my-bucket/data", exts=[".parquet", ".csv"])
print(files[:5])

Next steps