Skip to content

Instantly share code, notes, and snippets.

Last active April 20, 2017 00:17
Show Gist options
  • Save fgreg/27dd50df6f70c81993bbc700d006e9a4 to your computer and use it in GitHub Desktop.
Save fgreg/27dd50df6f70c81993bbc700d006e9a4 to your computer and use it in GitHub Desktop.
Got a mesos docker agent to run pyspark

Ran these docker commands:

docker network create --attachable mesos

docker run -it -p 5050:5050 --hostname mesos-master --name mesos-master --network mesos -e MESOS_IP= nexusjpl/spark-mesos-master

docker run -it --network mesos --name mesos-agent --hostname mesos-agent -p 5051:5051 -p 4040:4040 nexusjpl/spark-mesos-agent ./bin/ --master=mesos-master:5050 --ip= --port=5051 --work_dir=/var/lib/mesos --no-systemd_enable_support --launcher=posix --no-switch_user

docker exec -it mesos-agent /bin/bash

Then in bash on mesos-agent I ran this command:

[root@mesos-agent build]# MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/ \
${SPARK_HOME}/bin/pyspark --conf spark.master=mesos://mesos-master:5050 --driver-class-path $(${HADOOP_HOME}/bin/hadoop classpath) --conf spark.executor.extraClassPath=$(${HADOOP_HOME}/bin/hadoop classpath)

The key was providing both --driver-class-path $(${HADOOP_HOME}/bin/hadoop classpath) and --conf spark.executor.extraClassPath=$(${HADOOP_HOME}/bin/hadoop classpath)

And then I was able to run Joe's example code.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0

Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
SparkSession available as 'spark'.
>>> from operator import add
>>> x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
>>> rdd = sc.parallelize(x,1)
>>> print rdd.foldByKey(0,add).collect()
[(1.5, 600.0), (2.5, 150.0)]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment