-
-
Save yaravind/d99474e0e42f0baf1d023591a9ddcaef to your computer and use it in GitHub Desktop.
curl -X POST -d http://master-host:6066/v1/submissions/create --header "Content-Type:application/json" --data '{ | |
"action": "CreateSubmissionRequest", | |
"appResource": "hdfs://localhost:9000/user/spark-examples_2.11-2.0.0.jar", | |
"clientSparkVersion": "2.0.0", | |
"appArgs": [ "10" ], | |
"environmentVariables" : { | |
"SPARK_ENV_LOADED" : "1" | |
}, | |
"mainClass": "org.apache.spark.examples.SparkPi", | |
"sparkProperties": { | |
"spark.jars": "hdfs://localhost:9000/user/spark-examples_2.11-2.0.0.jar", | |
"spark.driver.supervise":"false", | |
"spark.executor.memory": "512m", | |
"spark.driver.memory": "512m", | |
"spark.submit.deployMode":"cluster", | |
"spark.app.name": "SparkPi", | |
"spark.master": "spark://master-host:6066" | |
} | |
} |
to kill a submitted app
curl -X POST http://spark-cluster-ip:6066/v1/submissions/kill/driver-20170206232033-0003
to get status
curl http://spark-in-action:6066/v1/submissions/status/driver-20170206232033-0003
Need to have "spark.jars": "hdfs://localhost:9000/user/spark-examples_2.11-2.0.0.jar",
and
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
for this job to successfully run
spark-submit --verbose --master spark://spark-in-action:7077
--class org.apache.spark.examples.SparkPi
$SPARK_HOME/examples/jars/spark-examples_2.11-2.0.0.jar
How can I submit configuration through API like
--driver-java-options -Dconfig.file=
How to pass application arguments and conf from these APIs
./bin/spark-submit
--class
--master
--deploy-mode
--conf =
... # other options
[application-arguments]
@Hammad-Raza
Hi, I also need to pass an argument to my spark-job. And this is how I've solved it:
curl -X POST -d http://master-host:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appArgs": [ "Path/to/my/python/file", "arg1" ], <= Here you can send your arguments
"appResource": "Path/to/my/python/file",
"clientSparkVersion": "2.3.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.driver.supervise": "true",
"spark.app.name": "My app",
"spark.eventLog.enabled": "true",
"spark.eventLog.dir": "file:/tmp/spark-events",
"spark.submit.deployMode": "cluster",
"spark.master": spark_url,
"spark.ui.enabled": "true",
}
}
Then inside my job which is a python file:
import sys
from pyspark.conf import SparkConf
print("sys.argv=>", sys.argv)
The output is:
sys.argv=> ['Path/to/my/python/file', 'arg1']
You can also see all the spark config by the following code
conf = SparkConf()
print(sorted(conf.getAll(), key=lambda p: p[0]) )
The output is:
[('spark.app.name', 'myfile.py'), ('spark.driver.cores', '4'), ('spark.driver.extraClassPath', '/jars/mysql-connector-java-8.0.11.jar'), ('spark.driver.supervise', 'true'), ('spark.eventLog.dir', 'file:/tmp/spark-events'), ('spark.eventLog.enabled', 'true'), ('spark.executor.extraClassPath', '/jars/mysql-connector-java-8.0.11.jar'), ('spark.executorEnv.JAVA_HOME', '/java-1.8.0-openjdk-1.8.0.171-8.b10.el7_5.x86_64'), ('spark.files', 'file:Path/to/my/python/file'), ('spark.master', 'spark://master-url:7077'), ('spark.submit.deployMode', 'client'), ('spark.ui.enabled', 'true')]
Hope this help
Replace hdfs with file:/ if the jar is on local file system