sparkui

I will have to think about a sensible place to put this.
But here’s how you can get the spark UI for a glue job:

job = GlueJob('my_dir/', bucket=bucket, job_role=my_role,
              job_arguments={"--test_arg": 'some_string',
                             '--enable-spark-ui': 'true',
                             '--spark-event-logs-path': 's3://alpha-data-linking/glue_test_delete/logsdelete' })

then

sync files from s3://alpha-data-linking/glue_test_delete/logsdelete to a local folder called events .

Then build:

ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.0
FROM ${SPARK_IMAGE}

RUN apk --update add coreutils

RUN mkdir /tmp/spark-events
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]

docker build -t shs .

then

docker run -v ${PWD}/events:/tmp/spark-events -p 18080:18080 shs (edited)

What’s happening?

The glue job outputs logs to the logs path
This is all the spark ui needs to allow you to analyse what went on in a job. Copy them to a directory on your local computer called events
We start the spark history server in Docker. By default this looks for new logs in /tmp/spark_events.
We map our local events directory into this directory in the container using -v
By default the history server gui is on 18080 so we map this using -p

mamonu/SparkUI.md