There's multiple ways to generate a kfp component:
- from python function
- from file/text
You'll use component.load_component_from_<file|text>
when you need to interface with a command line tool.
Example of this would include a spark job submission tool such as dataproc from GCP
gcloud dataproc jobs submit spark --cluster example-cluster \
--region=region \
--class org.apache.spark.examples.SparkPi \
--jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
The easiest way to code this up
name: Generic dataproc submission component
description: Submits spark job for generating train and target data using dataproc
inputs:
- { name: region, type: String, description: "region"}
- { name: class, type: String, default: "the java class"}
- { name: jars, type: String, default: "the jar files"}
- { name: args, type: String, default: "the arguments"}
outputs:
- name: logs
type: String
metadata:
annotations:
"iam.amazonaws.com/role": "<your iam role>"
implementation:
container:
image: <image>
command:
- sh
- -exc
- |
gcloud dataproc jobs submit spark \
--region=$0 \
--class $1 \
--jars $2 -- $3
- inputValue: region
- inputValue: class
- inputValue: jars
- inputValue: args
If you have a scenerio where you need to submit your job using another tool which requires the pyspark application's job args to be passed as an argument ie.
cli-tool -executors 30 -args '--app-config s3://wow --version 0.01'
You can follow this example. using double quotes "
is allowed in the multiline string
name: Generic chimera submission component
description: Submits spark job for generating train and target data using chimeracli
inputs:
- { name: region, type: String, description: "region"}
- { name: class, type: String, default: "the java class"}
- { name: jars, type: String, default: "the jar files"}
- { name: args, type: String, default: "the arguments"}
outputs:
- name: logs
type: String
metadata:
annotations:
"iam.amazonaws.com/role": "<your iam role>"
implementation:
container:
image: <image>
command:
- sh
- -exc
- |
cli-tool jobs submit spark \
--region=$0 \
--class $1 \
--jars $2
--args "--app-config $3 --version $4"
- inputValue: region
- inputValue: class
- inputValue: jars
- inputValue: config
- inputValue: version
You can test the command locally first by running:
sh -exc 'cli-tool jobs submit spark --region $0 --class $1 --jars $2 --args "--app-config $3 --version $4"' 'us' 'MyClass' 'some jars' 'some config' '0.12.3'