Skip to content

Instantly share code, notes, and snippets.

@ari-vedant-jain
Last active August 29, 2015 14:23
Show Gist options
  • Save ari-vedant-jain/53439a99e8a375ddc522 to your computer and use it in GitHub Desktop.
Save ari-vedant-jain/53439a99e8a375ddc522 to your computer and use it in GitHub Desktop.
Building Spark for PySpark use on top of YARN
Build Spark on local machine (only if using PySpark; otherwise, remote machine works) (http://spark.apache.org/docs/latest/building-with-maven.html)
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Copy the assembly/target/scala-2.10/...jar to the corresponding directory on the cluster node and also into a location in HDFS.
pyspark --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
If you just want to use the Scala spark-shell, you can build Spark on the cluster too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment