Sample Transformers

AIStore hosts a variety of sample transformers in the form of Docker images to be used with ETL workflows on AIStore via the init spec functionality.

Transformer	Language	Communication Mechanisms	Description
`echo`	`python:3.13`	`hpull`, `hpush`	Returns the original data, with an `MD5` sum in the response headers.
`go_echo`	`golang:1.24`	`hpull`, `hpush`	Returns the original data, with an `MD5` sum in the response headers.
`hello_world`	`python:3.13`	`hpull`, `hpush`	Returns `Hello World!` string on any request.
`go_hello_world`	`golang:1.24`	`hpull`, `hpush`	Returns `Hello World!` string on any request (Go implementation).
`md5`	`python:3.13`	`hpull`, `hpush`	Returns the `MD5` sum of the original data as the response.
`hash_with_args`	`python:3.13`	`hpull`, `hpush`	Returns the `XXHash64` digest of the original data with customizable seed arguments.
`tar2tf`	`golang:1.21`	`hpull`, `hpush`	Returns the transformed TensorFlow compatible data for the input `TAR` files.
`compress`	`python:3.11`	`hpull`, `hpush`	Returns the compressed or decompressed data using `gzip` or `bz2`.
`FFmpeg`	`python:3.13`	`hpull`, `hpush`	Returns audio files in `WAV` format with control over Audio Channels (`AC`) and Audio Rate (`AR`).
`go_FFmpeg`	`golang:1.24`	`hpull`, `hpush`	Returns audio files in `WAV` format with control over Audio Channels (`AC`) and Audio Rate (`AR`) (Go implementation).
`NeMo/audio_split_consolidate`	`python:3.13`	`hpull`, `hpush`	Splits and consolidates audio files using JSONL manifests with distributed processing architecture.
`parquet-parser`	`golang:1.24`	`hpush`	Converts Parquet files to JSON, CSV, or TXT formats with concurrent processing and dynamic schema extraction.
`batch_rename`	`python:3.13`	`hpull`, `hpush`	Renames objects matching regex patterns and copies them to destination buckets with modified paths.
`face_detection`	`python:3.8-slim`	`hpull`, `hpush`	Detects faces in images using Single Shot MultiBox Detector (`SSD`) model and returns images with bounding boxes.
`keras`	`python:3.9-slim`	`hpull`, `hpush`	Returns the transformed images using `Keras` pre-processing.
`torchvision`	`python:3.9-slim`	`hpull`, `hpush`	Returns the transformed images using `Torchvision` pre-processing.

General Usage

The following sections demonstrate initializing ETLs on AIStore using the provided sample transformers.

For detailed usage information and optional parameters for any transformer, please refer to the README documents located in their respective sub-directories.

Pre-Requisites

ETLs on AIStore requires the installation and use of Kubernetes.

For more information on AIStore Kubernetes deployment options, refer here.

Usage w/ AIStore CLI

There are two ways to initialize transformers:

1. Runtime-spec (Recommended)

The modern approach uses a compact etl_spec.yaml that lists only the image, command, and optionally communication type, environment variables, timeouts, etc.

# Change Directory (to Desired Sample Transformer)
cd ais-etl/transformers/md5

# Initialize ETL directly from runtime spec
ais etl init spec --from-file etl_spec.yaml md5-etl

# Transform objects (inline)
ais etl object md5-etl ais://<src-bck>/<obj> -

# Transform bucket-to-bucket
ais etl bucket md5-etl ais://<src-bck> ais://<dst-bck>

2. Legacy Pod-spec (Still Supported)

The original method using full Kubernetes Pod specification with environment variable substitution:

# Change Directory (to Desired Sample Transformer)
cd ais-etl/transformers/md5

# Export Environment Variables for Communication Mechanism (& Any Additional Arguments)
export COMMUNICATION_TYPE="hpull://"

# Substitute Environment Variables in YAML Specification
envsubst < pod.yaml > init_spec.yaml

# Initialize ETL on AIStore via CLI
ais etl init spec --from-file init_spec.yaml --name md5-etl-legacy

# Transform objects (inline)
ais etl object md5-etl-legacy ais://<bck-name>/<obj-name>.<ext> -

# Transform bucket-to-bucket
ais etl bucket md5-etl-legacy ais://src-bck ais://dst-bck

Note: Most transformers now provide both etl_spec.yaml (runtime-spec) and pod.yaml (legacy pod-spec) files. The runtime-spec approach is recommended for new deployments.

Usage w/ AIStore Python SDK

The YAML specification files for the sample transformers are provided as templates.

Contribution

The maintenance of the sample transformers on DockerHub is managed by the ais-etl GitHub repository.

To contribute, push any changes to sample transformers to the GitHub repository. The existing GitHub workflows will build and push the updated sample transformers to the DockerHub repostiory.

For more information, refer to the GitHub workflow files here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample Transformers

General Usage

Pre-Requisites

Usage w/ AIStore CLI

1. Runtime-spec (Recommended)

2. Legacy Pod-spec (Still Supported)

Usage w/ AIStore Python SDK

Contribution

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sample Transformers

General Usage

Pre-Requisites

Usage w/ AIStore CLI

1. Runtime-spec (Recommended)

2. Legacy Pod-spec (Still Supported)

Usage w/ AIStore Python SDK

Contribution

References