Ran these docker commands:
docker network create --attachable mesos
docker run -it -p 5050:5050 --hostname mesos-master --name mesos-master --network mesos -e MESOS_IP=0.0.0.0 nexusjpl/spark-mesos-master
docker run -it --network mesos --name mesos-agent --hostname mesos-agent -p 5051:5051 -p 4040:4040 nexusjpl/spark-mesos-agent ./bin/mesos-agent.sh --master=mesos-master:5050 --ip=0.0.0.0 --port=5051 --work_dir=/var/lib/mesos --no-systemd_enable_support --launcher=posix --no-switch_user
docker exec -it mesos-agent /bin/bash
Then in bash on mesos-agent I ran this command:
[root@mesos-agent build]# MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so \
${SPARK_HOME}/bin/pyspark --conf spark.master=mesos://mesos-master:5050 --driver-class-path $(${HADOOP_HOME}/bin/hadoop classpath) --conf spark.executor.extraClassPath=$(${HADOOP_HOME}/bin/hadoop classpath)
The key was providing both --driver-class-path $(${HADOOP_HOME}/bin/hadoop classpath)
and --conf spark.executor.extraClassPath=$(${HADOOP_HOME}/bin/hadoop classpath)
And then I was able to run Joe's example code.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 2.7.5 (default, Nov 6 2016 00:28:07)
SparkSession available as 'spark'.
>>> from operator import add
>>> x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
>>> rdd = sc.parallelize(x,1)
>>> print rdd.foldByKey(0,add).collect()
[(1.5, 600.0), (2.5, 150.0)]