###Hive Default Delimiters to CSV #####Reasoning Hive's default delimiters are
Row Delimiter => Control-A ('\001')
Collection Item Delimiter => Control-B ('\002')
Map Key Delimiter => Control-C ('\003')
There are ways to change these delimiters when exporting tables but sometimes you might still get stuck needing to convert this to csv.
Here's a quick bash script that can handle a DB export that's segmented in multiple files and has the default delimiters. It will output a single CSV file.
It is assumed that the segments all have the naming convention 000*_0
INDIRECTORY="path/to/input/directory"
for f in $INDIRECTORY/000*_0; do
echo "Processing $f file..";
cat -v $f |
LC_ALL=C sed -e "s/^/\"/g" |
LC_ALL=C sed -e "s/\^A/\",\"/g" |
LC_ALL=C sed -e "s/\^C\^B/\"\":\"\"\"\",\"\"/g" |
LC_ALL=C sed -e "s/\^B/\"\",\"\"/g" |
LC_ALL=C sed -e "s/\^C/\"\":\"\"/g" |
LC_ALL=C sed -e "s/$/\"/g" > $f-temp
done
echo "you,can,echo,your,header,here,if,you,like" > $INDIRECTORY/final_output.csv
cat $INDIRECTORY/*-temp >> $INDIRECTORY/final_output.csv
rm $INDIRECTORY/*-temp
#####Example
Suppose our hive table with 1 row looks like this:
id | date | event-map | col4 |
---|---|---|---|
ec329063 | 1/21/16 | {"hash":"","currenturl":"google","adblock":"true"} | dxbr5b0v87 |
The raw hive output with default delimiters would then be
ec329063^A2016-01-21^Ahash^C^Bcurrenturl^Cgoogle^Badblock^Ctrue^Adxbr5b0v87
Once the the script runs, you'll have a csv output:
"ec329063","2016-01-21","hash"":"""",""currenturl"":""google"",""adblock"":""true","dxbr5b0v87"