This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
print('Loading function') | |
def lambda_handler(event, context): | |
print('------------------------') | |
print(event) | |
#1. Iterate over each record | |
try: | |
for record in event['Records']: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Table information like sortkeys, unsorted percentage | |
-- see http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html | |
SELECT * FROM svv_table_info; | |
-- Table sizes in GB | |
SELECT t.name, COUNT(tbl) / 1000.0 AS gb | |
FROM ( | |
SELECT DISTINCT datname, id, name | |
FROM stv_tbl_perm | |
JOIN pg_database ON pg_database.oid = db_id |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.nio.charset.StandardCharsets; | |
import java.util.ArrayList; | |
import java.util.Base64; | |
import java.util.HashMap; | |
import java.util.Map; | |
import javax.annotation.Nonnull; | |
import javax.crypto.Mac; | |
import javax.crypto.spec.SecretKeySpec; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import subprocess | |
import sys | |
os.environ["SPARK_HOME"] = r"/usr/lib/spark" | |
# Set PYTHONPATH for Spark | |
for path in [r'/usr/lib/spark/python/', r'/usr/lib/spark/python/lib/py4j-src.zip']: | |
sys.path.append(path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Configuration needed of jars | |
%%configure | |
{ | |
"conf": { | |
"spark.jars": "https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.36.1060/RedshiftJDBC42-no-awssdk-1.2.36.1060.jar", | |
"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.2,io.github.spark-redshift-community:spark-redshift_2.11:4.0.1" | |
} | |
} | |
# define redshift connection info |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE EXTERNAL TABLE IF NOT EXISTS gdelt.events ( | |
`globaleventid` INT,`day` INT,`monthyear` INT,`year` INT,`fractiondate` FLOAT, | |
`actor1code` string,`actor1name` string,`actor1countrycode` string,`actor1knowngroupcode` string, | |
`actor1ethniccode` string,`actor1religion1code` string,`actor1religion2code` string, | |
`actor1type1code` string,`actor1type2code` string,`actor1type3code` string, | |
`actor2code` string,`actor2name` string,`actor2countrycode` string,`actor2knowngroupcode` string, | |
`actor2ethniccode` string,`actor2religion1code` string,`actor2religion2code` string, | |
`actor2type1code` string,`actor2type2code` string,`actor2type3code` string, | |
`isrootevent` BOOLEAN,`eventcode` string,`eventbasecode` string,`eventrootcode` string, | |
`quadclass` INT,`goldsteinscale` FLOAT,`nummentions` INT,`numsources` INT,`numarticles` INT,`avgtone` FLOAT, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Example Schema Validation | |
Assumes the DataFrame `df` is already populated with schema: | |
{id : int, day_cd : 8-digit code representing date, category : varchar(24), type : varchar(10), ind : varchar(1), purchase_amt : decimal(18,6) } | |
Runs various checks to ensure data is valid (e.g. no NULL id and day_cd fields) and schema is valid (e.g. [category] cannot be larger than varchar(24)) | |
''' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#/usr/bin/python3 | |
# -*- coding: utf-8 -*- | |
import logging | |
import airflow | |
from airflow import DAG | |
from datetime import timedelta, datetime | |
from airflow.operators.dummy_operator import DummyOperator | |
from airflow.operators.python_operator import PythonOperator | |
from airflow.operators.http_operator import SimpleHttpOperator |
NewerOlder