S3 commandline client s5cmd

In this page you will find documentation about the s5cmd S3 client.


s5cmd

The tool s5cmd allows you to parallelise workloads like data transfers. This is very convenient when you want to copy a whole directory with its contents to an S3 bucket or vice versa. More information may be found at https://github.com/peak/s5cmd, https://joshua-robinson.medium.com/s5cmd-for-high-performance-object-storage-7071352cc09d and https://aws.amazon.com/blogs/opensource/parallelizing-s2-workloads-s5cmd/.

The key benefit of s5cmd is its greatly improved performance as compared to s3cmd and aws cli etc.

Installation

Binaries for Windows, Mac and Linux can be downloaded from: https://github.com/peak/s5cmd/releases

Authentication

You can populate the environment with the proper values, after which you do not need to pass anything on the command line:

export S3_ENDPOINT_URL=https://objectstore.surf.nl
export AWS_REGION=default
export AWS_ACCESS_KEY_ID=<access key>
export AWS_SECRET_ACCESS_KEY=<secret key>

Or you an create a configuration file similar to the awscli client. The file must located at ~/.aws/credentials

[profile default]
aws_access_key_id = <access key>
aws_secret_access_key = <secret key>
region = default

In the latter case, you will need to specify the service endpoint like so:

s5cmd --endpoint-url https://objectstore.surf.nl ls

Upload/Download an object to/from a bucket

An object can be uploaded to a bucket by the following command:

s5cmd cp <file name> s3://mybucket/myobject

It can be downloaded by:

s5cmd cp s3://mybucket/myobject <filename>

Upload a folder with contents to a bucket

s5cmd cp /path/to/my/folder s3://mybucket

Download a bucket with contents to a directory

s5cmd cp s3://mybucket/* /path/to/my/folder/.

Large files

Important

By default s5cmd spawns 256 workers to do its tasks in parallel. This tool is really well suited for transferring a large number of small files. For larger files (>= 1GB) we have found it beneficial to reduce the number of workers to a smaller number, like for example 20, in order to reduce the load on the client side. To do that use the commandline flag --numworkers <value>. An example is shown below:

s5cmd --numworkers 20 cp /path/to/my/folder/with/big/files s3://mybucket
  • No labels