Skip to content

Instantly share code, notes, and snippets.

View vanvaridiksha's full-sized avatar

Diksha Vanvari vanvaridiksha

  • Columbia University
  • New York
View GitHub Profile
@vanvaridiksha
vanvaridiksha / csds.md
Last active March 6, 2016 04:52
CSDS Assignment 2

#CSDS Hive Assignment 2

In the previous assignment you worked with the Hadoop, HDFS and Hive environments to perform simple map-reduce and basic operations like loading data in HDFS and querying on a small Hive database. Now we shall see and learn how to work with actual big data. In this assignment you shall write your own map-reduce programs to perform more sophisticated tasks. Further you will create your own Hive database given the dataset in raw form and run few queries on it.

This assignment can be performed on the same cloudera virtual machine that was used for the previous assignment. No further setup or installation will be needed. You are free to use any language of your choice for this assignment.

##About the Dataset (editing of technical details might be needed)

File Name - server-logz.gz (300 Mb)