This document describes one means of running a simple Apache Spark cluster on Cloud Foundry. It makes heavy use of Cloud Foundry's container networking features.
You can see an example running at http://spark-ui-proxy.184.73.108.92.xip.io.
This cluster was deployed using BOSH-Lite on AWS. Note, this Director cannot be targetted with the new BOSH CLI (see cloudfoundry-attic/bosh-lite#424), but you can use the "old" Ruby CLI just fine. You can use the new CLI for local workflows like manifest interpolation, and then the "old" CLI for remote workflows like deploying and SSH.
My BOSH CLIs are as follows:
$ which bosh
/Users/amitgupta/.rubies/ruby-2.4.0/bin/bosh
$ bosh --version
BOSH 1.3262.26.0
$ /usr/local/bin/bosh --version
version 2.0.1-74fad57-2017-02-15T20:17:00Z
Succeeded
So bosh
commands refer to the Ruby CLI, and /usr/local/bin/bosh
will be used for the Golang CLI.
Adapting from the cf-networking-release
documentation on BOSH-Lite deploys, you must:
$ ssh -i <BOSH_LITE_SSH_KEY_PATH> ubuntu@<BOSH_LITE_ELASTIC_IP>
> sudo modprobe br_netfilter
$ bosh upload stemcell https://bosh.io/d/stemcells/bosh-warden-boshlite-ubuntu-trusty-go_agent?v=3363.12
$ bosh upload release https://bosh.io/d/github.com/cloudfoundry-incubator/cf-networking-release?v=0.18.0
We use cf-deployment
with a few tweaks so that:
- it's tailored to BOSH-Lite but doesn't switch the control plane's database from MySQL to Postgres (see cloudfoundry/cf-deployment#96);
- adds a
director_uuid
to the manifest so that it works with the old Ruby BOSH CLI; and - supports Container Networking.
$ cd ~/workspace/cf-deployment
$ git checkout bfa7cc261ac492924c4c6773b3dcfaf8bf2ab74c
$ /usr/local/bin/bosh interpolate \
-v system_domain=<BOSH_LITE_ELASTIC_IP>.xip.io \
-v director_uuid=$(bosh status --uuid) \
-o operations/bosh-lite.yml \
-o operations/director-uuid.yml \
-o operations/c2c-networking.yml \
--vars-store cf-vars-store.yml \
cf-deployment.yml > cf.yml
The ops files used here are included in this Gist.
bosh-lite.yml
is based on this but has the parts about removing "unnecessary" NATS IPs and switching all the MySQL stuff to Postgres removed.c2c-networking.yml
is based on this but removes the hardcoded NATS properties since these aren't correct in the BOSH-Lite context, and are handled implicitly by BOSH Links. That improvement is already covered by an existing PR.director-uuid.yml
is new, and just to make things work with the old BOSH CLI.
Note the use of xip.io for system domain. This service is known to be very flaky, so expect a few bumps, especially when doing anything like getting cf logs
.
$ bosh -d cf.yml deploy
$ cf api api.<BOSH_LITE_ELASTIC_IP>.xip.io --skip-ssl-validation
$ cf auth admin $(/usr/local/bin/bosh int -l cf-vars-store.yml <(echo '((uaa_scim_users_admin_password))'))
$ cf enable-feature-flag diego_docker
$ cf create space and-time
$ cf target -s and-time
This configuration is based on deploying Spark on Kubernetes.
$ cf push spark-master \
--docker-image gcr.io/google_containers/spark:1.5.2_v1 \
-i 1 \
-m 4G \
--no-manifest \
--health-check-type none \
-c 'sleep 20 && /bin/bash -c ". /start-common.sh; /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip ${CF_INSTANCE_INTERNAL_IP} --port 7077 --webui-port 8080"'
$ cf push spark-worker \
--docker-image gcr.io/google_containers/spark:1.5.2_v1 \
-i 3 \
-i 1G \
--no-manifest \
--health-check-type none \
-c "sleep 20 && /bin/bash -c \". /start-common.sh; /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$(cf ssh spark-master -c 'echo ${CF_INSTANCE_INTERNAL_IP}'):7077 --port 7077 --webui-port 8080\"" \
--no-start
I don't understand the details of how Spark communication works, but it appears the Master and/or the Workers spin up many concurrent TCP listeners, on random ports, and need to be able to connect to each other on these ports. While the current Container Networking features are great for connecting applications via a small handful of ports, unlike rice, it's not great when you want 2000 of something.
Run the provided Ruby script to allow access. This may take 3-4 minutes:
$ ruby allow-access.rb <BOSH_LITE_ELASTIC_IP>
$ cf start spark-worker
This part isn't strictly necessary, but it allows you to see Worker logs. This allowed me to figure out that there were a lot more ports I needed to allow access for between the Master and Workers, so it's helpful to have.
$ cd ~/workspace/spark-ui-proxy
$ ls
spark-ui-proxy.py
$ touch requirements.txt
$ cf push spark-ui-proxy \
-b python_buildpack \
-i 1 \
-m 1G \
--no-manifest \
--health-check-type none \
-c "python spark-ui-proxy.py $(cf ssh spark-master -c 'echo ${CF_INSTANCE_INTERNAL_IP}'):8080" \
--no-start
$ cf allow-access spark-ui-proxy spark-master --port 7077 --protocol tcp
$ cf allow-access spark-ui-proxy spark-master --port 8080 --protocol tcp
$ cf allow-access spark-ui-proxy spark-worker --port 7077 --protocol tcp
$ cf allow-access spark-ui-proxy spark-worker --port 8080 --protocol tcp
$ cf start spark-ui-proxy
$ cf ssh spark-master
~# /opt/spark-1.5.2-bin-hadoop2.6/bin/spark-shell \
--master spark://${CF_INSTANCE_INTERNAL_IP}:7077 \
--name calculate_pi
> val count = sc.parallelize(1 to 500000000).filter { _ =>
val x = math.random
val y = math.random
x*x + y*y < 1
}.count()
...
17/03/19 09:34:43 INFO DAGScheduler: Job 0 finished: count at <console>:25, took 85.931528 s
count: Long = 392707353
> println(s"Pi is roughly ${4.0 * count / 500000000}")
Pi is roughly 3.141658824
... make sure to re-push the Workers and UI Proxies since their start commands are dynamically computed at cf push
time with some embedded bash
that gets the Master's internal IP, which changes every time it restarts.
- Paravirtualization hypervisor is running...
- a BOSH-Lite VM, which is running...
- Garden, which is running...
- A Linux container representing the Diego Cell, which is running...
- Garden (again), which is running...
- Linux containers representing the Spark Master and Workers, which are running...
- Spark task executors.
A single-threaded Golang calculation was much faster than this. Note that the 3 "Workers" are all really just running on the same VM, as is the Master and UI Proxies.
$ time go run pi.go
Pi is roughly 3.1415864480
real 0m42.111s
user 0m41.764s
sys 0m0.270s