Last active
February 4, 2024 15:58
-
-
Save jjstill/8099669931cdfbb90ce6f4c307971514 to your computer and use it in GitHub Desktop.
Running Spark job on local kubernetes (minikube)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Starting minikube with 8Gb of memory and 3 CPUs | |
minikube --memory 8192 --cpus 3 start | |
# Creating separate Namespace for Spark driver and executor pods | |
kubectl create namespace spark | |
# Creating ServiceAccount and ClusterRoleBinding for Spark | |
kubectl create serviceaccount spark-serviceaccount --namespace spark | |
kubectl create clusterrolebinding spark-rolebinding --clusterrole=edit --serviceaccount=spark:spark-serviceaccount --namespace=spark | |
# Spark home dir | |
cd $SPARK_HOME | |
# Asking local environment to use Docker daemon inside the Minikube | |
eval $(minikube docker-env) | |
# Building Docker image from provided Dockerfile | |
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . | |
# Submitting SparkPi example job | |
# $KUBERNETES_MASTER can be taken from output of kubectl cluster-info | |
bin/spark-submit --master k8s://$KUBERNETES_MASTER --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.driver.pod.name=spark-pi-driver --conf spark.kubernetes.container.image=spark:latest --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar | |
# Printing Spark driver's log | |
kubectl logs spark-pi-driver --namespace spark | |
# When the application completes, the executor pods terminate and are cleaned up, | |
# but the driver pod persists logs and remains in "completed" state. | |
# Deleting spark-pi-driver pod | |
kubectl delete pod spark-pi-driver --namespace spark | |
How to use SparkLauncher programmatically submit spark job to minikube? Any example is appreciated.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
this is great stuff thanks.
I am using the following command to create python version of docker image.
the docker image is created successfully. However, I want to use external packages likr pyyaml etc.
I try this spark-submit command
This sounds OK and the python packages needs to extracted from
--archives=hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${pyspark_venv}.tar.gz#${pyspark_venv} \
This is the unpackking statement
Unpacking an archive hdfs://50.140.197.220:9000/minikube/codes/pyspark_venv.tar.gz#pyspark_venv from /tmp/spark-09e456aa-334e-4780-a780-80b21d09840a/pyspark_venv.tar.gz to /opt/spark/work-dir/./pyspark_venv
But it does not work because it cannot find any external package likr pyyaml or numpy etc
Traceback (most recent call last):
File "/tmp/spark-09e456aa-334e-4780-a780-80b21d09840a/testyml.py", line 25, in
main()
File "/tmp/spark-09e456aa-334e-4780-a780-80b21d09840a/testyml.py", line 22, in main
import yaml
ModuleNotFoundError: No module named 'yaml'
Do you have any ideas how I can make these external packages available inside minikube?
Thanks