Last active
September 18, 2019 18:35
-
-
Save jonathan-dejong/2db9170ac6159350b2f8c291860f114e to your computer and use it in GitHub Desktop.
Split CSV files by megabyte and retain head row (row 1)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [ ! -d "./backup" ]; then | |
mkdir backup | |
fi | |
lines=2000 | |
for i in *.csv; do | |
split -b ${1:-$lines} $i ${i%.csv}- | |
for j in ${i%.csv}-a*; do | |
if [[ "$j" != *-aa ]] | |
then | |
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j | |
fi | |
mv $j $j.csv | |
done | |
mv $i backup/$i | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [ ! -d "./backup" ]; then | |
mkdir backup | |
fi | |
megabyte=20 | |
for i in *.csv; do | |
split -b ${1:-$megabyte}m $i ${i%.csv}- | |
for j in ${i%.csv}-a*; do | |
if [[ "$j" != *-aa ]] | |
then | |
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j | |
fi | |
mv $j $j.csv | |
done | |
mv $i backup/$i | |
done |
Caveat: if your CSV contains newline characters within the actual records the point where you split the file may not split cleanly between two records but rather in the middle of one.
If you're just dealing with a few splitted files I found it easiest to just open each up and copy paste the correct data over from each other and delete the faulty rows.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How to use
Put file in a folder beside the .csv file you want to split.
Run the file as
sh csv-split.sh 20
where20
is an optional parameter of megabyte to split on.or alternatively if you choose to split by lines you can pass a number of lines to split on instead of Mb.