Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active April 5, 2024 04:45
Show Gist options
  • Save jeremyyeo/1221a290ebbc114671729c473ff3bbf2 to your computer and use it in GitHub Desktop.
Save jeremyyeo/1221a290ebbc114671729c473ff3bbf2 to your computer and use it in GitHub Desktop.

dbt Python models cliff notes

Sample dummy models for testing. Golden rule is that python models have to always return a dataframe.

Snowflake

# sf_table.py
import pandas as pd
def model(dbt, session):
    return pd.DataFrame({"id": [1]})

Importing another Python package for use in the model:

# sf_table.py
import pandas as pd
def model(dbt, session):
    dbt.config(packages=["agate"])
    import agate
    return pd.DataFrame({"id": [1]})

Returning an empty DataFrame:

# sf_incremental.py
import pandas as pd
def model(dbt, session):
    dbt.config(materialized = "incremental")

    if dbt.is_incremental:
        """
        If we need to return an empty dataframe on the subsequent (i.e. incremental run of the model)
        then simply returning something like this will not work.

            df = pd.DataFrame()
        
        See below for an example empty dataframe that will work without Snowpark errors. Basically,
        it has to be a dataframe where the columns (column names) are defined.
        """
        df = pd.DataFrame({"id": [], "name": []})
    else:
        df = pd.DataFrame({"id": [1], "name": ["alice"]})

    return df

BigQuery

# bq_table.py
def model(dbt, session):
    dbt.config(submission_method="cluster")
    data = [{"id": 1}]
    return session.createDataFrame(data)

Databricks

# db_python.py
import pandas as pd
def model(dbt, session):
    dbt.config(
        submission_method="all_purpose_cluster",
        cluster_id="1121-175813-2agrmn6x"
    )
    return pd.DataFrame({"id": [1]})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment