- copy files to a directory:
git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount
and thencd wordcount
. - see the input files:
cat *.txt
- make sure mapper and reducer are executable
chmod +x *.scala
- see how mapper works:
cat baa.txt | ./mapper.scala
- see how reducer works:
cat baa.txt | ./mapper.scala | ./reducer.scala
- copy files to a directory:
git clone https://gist.github.com/cc7c8cec1188fd387cc2e3ec0f4fed7a.git wordcount
and thencd wordcount
. - create a directory on HDFS:
hadoop fs -mkdir -p /wc/in
- copy input files into HDFS:
hadoop fs -put *.txt /wc/in
- make sure the files are transfered:
hadoop fs -ls /wc/in
You can also read their content using-cat
- make sure the mapper and reducer scripts are executable:
chmod +x *.scala
- make sure the output directory dose NOT exist:
hadoop fs -ls /wc/out
- issue:
hadoop jar /home/user/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -mapper mapper.scala -reducer reducer.scala -input /wc/in/* -output /wc/out
- make sure the above script run successfully:
hadoop fs -ls /wc/out
You should see a zero byte file called_SUCCESS
- read the output:
hadoop fs -cat /wc/out/part-00000