Often I hear people claiming S3 is slow, when in fact S3 is very fast but the user space program you are using is slow!
This gist provides a tool: rapid_s3_upload.sh
which will upload the directory you run the tool in to a provided S3 bucket, as fast as humanly possible.
This tool will saturate your NICs, will saturate your CPUS, and will upload data to S3 faster than basically anything else.
rapid_s3_upload.sh s3://some-path/in-s3
Furthermore if you want to use a customer compressor you can run it with the COMPRESS parameter.
COMPRESS="$(which lz4) rapid_s3_upload.sh s3://some-path/in-s3
Under the hood this tool is using xargs
to get maximum CPU parallelism, find to locate all files in the current working directory
and either zstd
or lz4
to do the compression. Generally speaking the default zstd
compressor is the right choice, but
when you have less compressible data and want to use less CPU. Currently the script doesn't support turning off compression but I
think it's pretty straightforward to refactor as you wish.