I will have to think about a sensible place to put this.
But here’s how you can get the spark UI for a glue job:
job = GlueJob('my_dir/', bucket=bucket, job_role=my_role,
job_arguments={"--test_arg": 'some_string',
'--enable-spark-ui': 'true',
'--spark-event-logs-path': 's3://alpha-data-linking/glue_test_delete/logsdelete' })
then
sync files from s3://alpha-data-linking/glue_test_delete/logsdelete
to a local folder called events .
Then build:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.0
FROM ${SPARK_IMAGE}
RUN apk --update add coreutils
RUN mkdir /tmp/spark-events
ENTRYPOINT ["/opt/spark/sbin/start-history-server.sh"]
docker build -t shs .
then
docker run -v ${PWD}/events:/tmp/spark-events -p 18080:18080 shs (edited)
What’s happening?
- The glue job outputs logs to the logs path
- This is all the spark ui needs to allow you to analyse what went on in a job. Copy them to a directory on your local computer called events
- We start the spark history server in Docker. By default this looks for new logs in /tmp/spark_events.
- We map our local events directory into this directory in the container using -v
- By default the history server gui is on 18080 so we map this using -p