This is a usage and design summary of the pulsar-io-bigquery sink.
This is the current list of parameters.
param name | description |
---|---|
credentials_file_path | BigQuery Json Key file Path |
project_id | BigQuery Project Id |
topic_data_set | BigQuery target topic/dataset map |
eg :"topic1:dataset1,topic2:dataset2" | |
topic_table_set | BigQuery target topic/table map |
eg :"topic1:table_tag1,topic2:table_tag2" | |
add_insert_timestamp | Adds a timestamp column |
time_stamp_column_name | default is "sink_timestamp" |
useMessageTimeDatePartitioning | Use Time Date Partitioning |
The current sink expects a gcp json credentials file to initialize, it also has message routing capabiltiy to different tables based on topic map.
sink localrun \
--archive ./pulsar-google-nar-0.0.1.nar \
--tenant public \
--namespace default \
--name bigquery-sink \
--inputs bigquery-data \
--sinkConfigFile ~/bigquery-sink.yaml
configs:
credentials_file_path: "/tmp/kubernetes-34c5c20a8e3e.json"
project_id: "sample-project-170720"
topic_data_set: "bigquery-data:test1"
topic_table_set: "bigquery-data:test_table1"
add_insert_timestamp: "true"
time_stamp_column_name: "inserted_timestamp"
There is no schema validation performed currently and there no integration with the pulsar ot bigquery schema registry at this time.
Option is provided to add a time_stamp column if the option is enabled to add an additional column per row with the utc timestamp generated from java, before the insertion request is made.
TODO