Skip to content

Instantly share code, notes, and snippets.

@deinarson
Last active October 11, 2016 20:28
Show Gist options
  • Save deinarson/7e488cef4d038025788eaab4075f32c3 to your computer and use it in GitHub Desktop.
Save deinarson/7e488cef4d038025788eaab4075f32c3 to your computer and use it in GitHub Desktop.
dedup rsync transfers, This prly wont work but might help if the hash has already been transferred
#!/bin/bash
#
# This script will probably only wirk with fixed size DBs, and prly not be ideal on any
# DB with compression on :\
#
# BUT!
# Here we take large files and split them in to several and then sync the by the hash
# This is probabilistically always be quicker than rsync by itself, since it
# sends everything
#
# feel free to run this several times, each time only the new sha's will be copied
RSYNC_ROOT=/Users/deinarso/hh
ENV=dev
PREFIX=rsync-dedup
DB_FILE=/Users/deinarso/hh/DB
mkdir -p ${RSYNC_ROOT}/${ENV}
cd ${RSYNC_ROOT}/${ENV}
split -b 1m ${DB_FILE} ${PREFIX}-${ENV}.
# change name to sha
cat /dev/null > dedupped_rename.list
for file in ${PREFIX}-${ENV}.*
do
shasum -a 256 $file >> dedupped_rename.list
SHA=$( grep -E $file\$ dedupped_rename.list | awk '{ print $1}')
mv ${file} ${SHA}
done
# send to the CLOUD server
rsync -rvzaP ${RSYNC_ROOT}/${ENV}/ ${CLOUD_SERVER}:${WORKING_DIR}/${ENV}/
#!/bin/bash
# run this to reconstruct the DB file
# CLOUD SIDE
for SHA in $(cat dedupped_rename.list | awk '{ print $1}')
do
file=$( grep $SHA dedupped_rename.list | awk '{ print $2}')
mv ${SHA} ${file}
done
cat rsync-dedup.* > DB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment