- CPU Utilization
- Throughput
- Waiting Time
- Turnaround Time
This will be a quick guide to get you introduced with one of the most popular and effective tools used for working with big data. Apache Spark is a cluster computing platform designed to be fast and general-purpose. On the speed side, Spark extends the popular MapReduce model to efficiently suport more types of computations, including interactive queries and stream processing.
- Unlike Hadoop, it is very easy to get Spark installed and running on your computer locally. But we have provided a pre-configured VM to get Spark and IPython notebook running quickly on your machine.
- A VagrantFile is provided in the repository which will instantiate an Ubuntu virtual machine for you. The steps for running a vagrant VM has been explained in the previous assignment.
- Once we have the machine up and running and you have ssh-ed into it, you will see a file
spark-notebook.py
in/home/vagrant
directory. - Simply run this script using the command
python spark_notebook.py
. This wi
#Data Mining
##Knowledge Discovery in Databases
- Types:
- Association Rules**
- Causality (Interestingness, Conviction)
- Clustering
- Classification
- Sequential Patterns
- Association Rules