This page is Resource & Reference for BigQuery course.
Contents are grouped based on lecture title.
The order of contents is same as order at course, although not all lectures has resource.
- Technology In This Course
- Start with BigQuery
- Functions
- Common Data Types
- Converting Data Types
- Exploring Common Data Types
- Clean & Transform With Dataprep
- Essential BigQuery
- Load Data Into BigQuery (Part 1) - The Basic
- Load Data Into BigQuery (Part 2) - Handling Errors
- Load Data Into BigQuery (Part 3) - Efficient Load
- Load Data Into BigQuery (Part 4) - From Your Data To BigQuery
- Load Data Into BigQuery (Part 5) - In Microservice Architecture
- Load Data Into BigQuery (Part 6) - Recurring Load
- BigQuery View
- Using Join
- Union and Intersect
- Basic Statistical Functions
- Download sample data
- Generate random data using mockaroo
- Limitation for upload CSV
- Limitation for upload JSON
- Database to CSV
- Logstash using Jdbc input plugin, and csv output plugin. See this sample.
- Embulk using Jdbc input plugin, and file output plugin formatter set to
csv
. See this sample.
- Database to newline-delimited JSON
- Logstash using Jdbc input plugin, and file output plugin with JSON lines codec
- Embulk using Jdbc input plugin, and file output plugin formatter set to
jsonl
. See this sample.
- Download sample data from Google Cloud Storage
- Sample data from Google Cloud Storage. Use these bucket names to load into BigQuery:
- Sample Purchase 03 :
course_bigquery_sample_data/load_data_into_bigquery_1/sample_purchase_03.csv
- Sample Purchase 04 :
course_bigquery_sample_data/load_data_into_bigquery_1/sample_purchase_04.json
- Sample Purchase 05 (multiple files) :
course_bigquery_sample_data/load_data_into_bigquery_1/sample_purchase_05*.csv
- Sample Purchase 03 :
- Download sample data from Google Drive
DDL script for creating `sample_employee`
CREATE TABLE IF NOT EXISTS
`course_dataset.sample_employee` (
employee_id STRING NOT NULL,
first_name STRING NOT NULL,
last_name STRING,
email STRING NOT NULL OPTIONS(description = 'Email can be any domain, free or corporate domain'),
gender STRING OPTIONS(description = 'F for Female, or M for Male'),
birth_date DATE,
salary INT64 NOT NULL ) OPTIONS (description = 'Table for saving sample employee',
labels = [
("course_series", "google"),
("created_by", "timotius_pamungkas")
]
)
- BigQuery DDL (Data Definition Language) documentation for create table
- BigQuery terminal commands used in the lecture
- BigQuery
bq
reference
Script to generate sample_employee schema
bq show --schema --format=prettyjson course_dataset.sample_employee > sample_employee_schema.json
- Download sample data
- Sample
avro
data from Google Cloud Storage. Use one of these bucket names to load into BigQuery:- without compression :
course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06.avro
deflate
compression :course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06_deflate.avro
snappy
compression :course_bigquery_sample_data/load_data_into_bigquery_3/sample_purchase_06_snappy.avro
- without compression :
- Avro specification
Loading avro using cloud shell script
bq load --source_format=AVRO --use_avro_logical_types course_dataset.sample_employee "gs://course_bigquery_sample_data/load_data_into_bigquery_3/sample_employee_06.avro"
- BigQuery Streaming Pricing
- Google Cloud SDK (including
bq
) - Google Cloud Client Libraries
- Dataflow, Logstash, or Embulk : see below
- For Kafka users : Kafka Connect using Jdbc source connector, and BigQuery sink connector
- SQL script for creating dummy data (using Postgresql database)
- Create Google Service Account credential (JSON)
- Dataflow
- Logstash
- Logstash download & installation
- Jdbc input plugin
- BigQuery output plugin
- Sample logstash pipeline configuration
- Command to execute logstash :
bin\logstash -f [path-to-pipeline-configuration-file]
- List of Logstash input plugins
- Embulk
- Cloud Functions home
- BigQuery Client Libraries Home
- JSON schema for
sample_weather
- Cloud function source code
- Java Spring Boot source code
- More about : Microservice Architecture & Pattern
- More about message broker : RabbitMQ
- More about message broker : Kafka
- JSON schema for
sample_loan
- Loan sample data (JSON)
- Loan sample data (Postgresql)
- Sample logstash pipeline (with scheduler)
- Sample embulk script
- Cloud function source code
Query for stackoverflow question per user
SELECT
user.display_name,
user.location,
COUNT(question.id) total_question
FROM
`bigquery-public-data.stackoverflow.posts_questions` question
INNER JOIN
`bigquery-public-data.stackoverflow.users` user
ON
question.owner_user_id = user.id
GROUP BY
user.display_name,
user.location
SQL for Cross Join
SELECT
profile.*,
merchandise.*
FROM
`project-id.course_dataset.contractor_profile` profile,
`project-id.course_dataset.contractor_merchandise` merchandise
SQL for Inner Join
SELECT
profile.*,
project.* EXCEPT(contractor_id)
FROM
`project-id.course_dataset.contractor_profile` profile
JOIN
`project-id.course_dataset.contractor_project` project
USING
(contractor_id)
SQL for Left Outer Join
SELECT
profile.*,
award.* EXCEPT(contractor_id)
FROM
`project-id.course_dataset.contractor_profile` profile
LEFT OUTER JOIN
`project-id.course_dataset.contractor_award` award
USING
(contractor_id)
ORDER BY
award_name NULLS FIRST,
contractor_id
SQL for Right Outer Join
SELECT
profile.*,
award.* EXCEPT(contractor_id)
FROM
`project-id.course_dataset.contractor_profile` profile
RIGHT OUTER JOIN
`project-id.course_dataset.contractor_award` award
USING
(contractor_id)
ORDER BY
award_name NULLS FIRST,
contractor_id
SQL for Full Outer Join
SELECT
profile.*,
award.* EXCEPT(contractor_id)
FROM
`project-id.course_dataset.contractor_profile` profile
FULL OUTER JOIN
`project-id.course_dataset.contractor_award` award
USING
(contractor_id)
ORDER BY
award_name NULLS FIRST,
contractor_id
SQL for Union & Intersect
Use keyword UNION ALL
, UNION DISTINCT
, or INTERSECT DISTINCT
SELECT
company_name,
contact_email,
tax_identification_number
FROM
`project-id.course_dataset.contractor_profile`
UNION ALL
SELECT
company_name,
contact_email,
tax_identification_number
FROM
`project-id.course_dataset.consultant_profile`
ORDER BY
company_name
Solution for basic statistical functions in loan_default
SELECT
AVG(default_amount) AS mean
FROM
`project-id.course_dataset.loan_default`
LIMIT
1;
SELECT
PERCENTILE_CONT(default_amount, 0.5) OVER() AS median_cont,
PERCENTILE_DISC(default_amount, 0.5) OVER() AS median_disc
FROM
`project-id.course_dataset.loan_default`
LIMIT
1;
SELECT
STDDEV_SAMP(default_amount) AS std_dev_sample,
STDDEV(default_amount) AS std_dev_sample_alias,
STDDEV_POP(default_amount) AS std_dev_pop
FROM
`project-id.course_dataset.loan_default`
LIMIT
1;
SELECT
CORR(default_amount, loan_amount)
FROM
`course-project-timpamungkas.course_dataset.loan_default`
LIMIT
1;