s5cmd
Introduction to s5cmd
s5cmd is a high-performance command-line tool for managing S3 and S3-compatible object storage systems. It is designed for speed and efficiency, supporting bulk operations, parallel processing, and advanced filtering. s5cmd is ideal for users who need to manage large-scale data transfers and automate workflows involving object storage.
With its simple syntax and powerful features, s5cmd is widely used for tasks such as uploading, downloading, syncing, and deleting objects in S3 buckets. For more details, visit the official s5cmd documentation.
Using s5cmd with Modules
To use s5cmd on the terrabyte HPC system, load the s5cmd module with the following command:
# consider adding the module use line to your ~/.bashrc to always make terrabyte modules available
module use /dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/share/modules/files/
module load s5cmd
Usage Examples
Once loaded, you can start using s5cmd to interact with S3-compatible storage systems. Below are some examples of common s5cmd operations:
Example 1: List Objects in a Bucket
To list all objects in an S3 bucket:
s5cmd ls s3://your-bucket-name/
Example 2: Upload a File to a Bucket
To upload a file to an S3 bucket:
s5cmd cp localfile.txt s3://your-bucket-name/
Example 3: Download a File from a Bucket
To download a file from an S3 bucket:
s5cmd cp s3://your-bucket-name/remotefile.txt .
Example 4: Sync a Local Directory with a Bucket
To synchronize a local directory with an S3 bucket:
s5cmd sync ./local-directory/ s3://your-bucket-name/
Example 5: Delete Objects in a Bucket
To delete all objects in an S3 bucket:
s5cmd rm s3://your-bucket-name/*
For additional usage instructions and configuration details, refer to the s5cmd documentation.