ari-vedant-jain · August 29, 2015 14:23
diff --git a/pyspark b/pyspark
 Building Spark for PySpark use on top of YARN

 Build Spark on local machine (only if using PySpark; otherwise, remote machine works) (http://spark.apache.org/docs/latest/building-with-maven.html)

 export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
 mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
 Copy the assembly/target/scala-2.10/...jar to the corresponding directory on the cluster node and also into a location in HDFS.


 pyspark --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
 If you just want to use the Scala spark-shell, you can build Spark on the cluster too.
	Building Spark for PySpark use on top of YARN

	Build Spark on local machine (only if using PySpark; otherwise, remote machine works) (http://spark.apache.org/docs/latest/building-with-maven.html)

	export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
	mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
	Copy the assembly/target/scala-2.10/...jar to the corresponding directory on the cluster node and also into a location in HDFS.


	pyspark --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
	If you just want to use the Scala spark-shell, you can build Spark on the cluster too.