Skip to content

Instantly share code, notes, and snippets.

Recce is the data validation toolkit for data engineers and non-technical stakeholders to review and interpret data change together

Validate data modeling work, review data change, and work as a team to speed up the QA process on data project pull requests, and significantly reduce time-to-merge.

Find Out More

@DaveFlynn
DaveFlynn / impact-validation-checklist.md
Last active June 26, 2024 03:25
Data Impact Validation Checklist

Impact validations

  • The scope of impact as has been checked (no unintentional modifications)
  • I have validated my modeling change (Show examples of how data change expectation was validated)
  • I have checked for critical downstream models
  • I have checked the data integrity of critical models (and attached holistic check results)
  • I have spot checked rows from critical models against prod to ensure data consistency (examples attached)
@DaveFlynn
DaveFlynn / Makefile
Created June 13, 2024 07:28
Makefile boilerplate for dbt projects
include .env
export
# Phony targets
.PHONY: build seed test snapshot run debug clean help
# Environment variables
TARGET ?= $(TARGET)
PROFILE ?= $(PROFILE)
@DaveFlynn
DaveFlynn / mattermost-pr-analysis.md
Last active February 6, 2024 08:13
Mattermost PR Analysis by Even Wei
old new
# pr 1043 128
# closed pr 34 (3.6%) 16 (12.5%)
# merged pr 1009 112
# model changed 3455 366
avg model changed 3.42 3.27
max model changed 30 29
avg merge time (seconds) 44016 292329
avg commits 15.3 9.1

Read In the Pipeline for more articles from Dave

Dave is Senior Technical Advocate for Recce, the cross-env data validation tool for dbt data projects. To understand more about Recce, or for help getting set up, get in touch at one of the links below:

Recce is your data validation toolkit

Recce is a cross-environment data-modeling validation tool for dbt projects.

Validate your work and create an all signal, no noise PR comment by curating a list of validation checks to speed up the QA process on data project pull requests, and significantly reduce time-to-merge.

@DaveFlynn
DaveFlynn / data-project-pr-comment-template.md
Last active September 23, 2024 07:05
PR Comment Template for dbt Data Projects
@DaveFlynn
DaveFlynn / AssertAllowedNull.py
Created September 2, 2022 06:36
PIpeRider custom assertion to allow a specific number of null values in a column
class AssertAllowedNulls(BaseAssertionType):
def name(self):
return "assert_allowed_nulls"
def execute(self, context: AssertionContext, table: str, column: str, metrics: dict) -> AssertionResult:
column_metrics = metrics.get('tables', {}).get(table, {}).get('columns', {}).get(column)
if column_metrics is None:
# column could not be found
return context.result.fail('column does not exist')
@DaveFlynn
DaveFlynn / custom-assertion.py
Created September 2, 2022 06:32
PipeRider Custom Assertion Template
class YourAssertionName(BaseAssertionType):
def name(self):
return "your_assertion_name"
def execute(self, context: AssertionContext, table: str, column: str, metrics: dict) -> AssertionResult:
column_metrics = metrics.get('tables', {}).get(table, {}).get('columns', {}).get(column)
if column_metrics is None:
# column could not be found
return context.result.fail('column does not exist')
import os
import pandas as pd
import numpy as np
class Model():
def __init__(self, model_uri):
print(model_uri)
print(os.listdir(model_uri))
self.model_uri = model_uri