Skip to content

Instantly share code, notes, and snippets.

@ahgraber
Last active September 18, 2024 07:34
Show Gist options
  • Save ahgraber/9ad4d0086a3f239f7872b7f33ebbe4c5 to your computer and use it in GitHub Desktop.
Save ahgraber/9ad4d0086a3f239f7872b7f33ebbe4c5 to your computer and use it in GitHub Desktop.
Linting and Formatting with pre-commit

Lint and Format

TLDR:

  • python: ruff, black
  • markdown: markdownlint
  • json: prettier
  • yaml: prettier

Automate with pre-commit

Python

Use black and ruff to help improve code quality.

Generally speaking, developer tools like these look for their configuration in a variety of files. There is a growing movement in the Python community to move away from setup.cfg to a pyproject.toml; however, many tools are caught in limbo and support one but not the other.

Because of this, the current (Q1 2022) SOP is to put everything that can be configured into a pyproject.toml file. Furthermore, setup.py should be deprecated (ref)

Installation

  1. On command line, install node tools:

    npm install markdownlint markdownlint-cli
    npm install prettier
    # npm install markdown-notes-tree # optional
  2. In a conda environment, the following packages should be installed

    # packages for linting
    black
    black-jupyter
    ruff
  3. On command line, install pre-commit:

    brew install pre-commit

Lint with ruff

ruff should be configured per project using a pyproject.toml file located at the repo root. ruff has a vscode extension

Autoformat with black

black should be configured per project using a pyproject.toml file located at the repo root.

After configuring, black can be run against all files in a repo by running black . from the repository root directory.

Other file formats

Markdown with markdownlint

markdownlint is the engine used by markdownlint-cli to lint and autoformat markdown (.md) files. Markdownlint has a vscode extension

YAML (and JSON) with prettier

prettier is a formatter for web-centric languages (html, xml, json, css, etc). We use prettier to format .yaml and .json files. While it can format markdown, we rely on markdownlint for that. Prettier has a vscode extension

Making markdownlint and prettier play nice

An .editorconfig file can specify what indentations and linting rules to apply to different file types. We set yaml indents to 2 spaces, and 4 elsewhere. editorconfig has a vscode extension

Automate formatting on every git commit with pre-commit

  1. Install pre-commit

  2. Add linters & formatters as pre-commit hooks (see .pre-commit-config.yaml below)

  3. Run initial pass

    pre-commit install # associate with git repo
    pre-commit autoupdate # update pre-commit hooks
    pre-commit run --all-files
# see https://editorconfig.org/
root = true
[*]
# Use Unix-style newlines for most files (except Windows files, see below).
end_of_line = lf
trim_trailing_whitespace = true
indent_style = space
insert_final_newline = true
indent_size = 4
charset = utf-8
[*.{bat,cmd,ps1}]
end_of_line = crlf
[*.md]
trim_trailing_whitespace = false
[*.{py, ipynb}]
indent_size=4
[*.{yml,yaml}]
indent_size = 2
[*.tsv]
indent_style = tab
# Standard .gitignore for Data Science projects at PMI
# ignore dot directories
.**/
# Ignore everything on these paths
artifacts/**
# data/raw/**
# data/int/**
# data/processed/**
logs/**
notebooks/**
src/**/*.yml
src/**/*.yaml
writeup/**
**/old
# Ignore these data file types
*.log
*.pkl
*.csv
*.xls
*.xlsx
*.doc
*.docx
*.html
*.pdf
*.ppt
*.pptx
*.txt
*.gif
*.jpg
*.jpeg
*.png
*.7z
*.bz
*.bz2
*.gz
*.gzip
*.rar
*.tar
*.tz
*.xz
*.zip
*.db
*.rdb
# Keep these files, even if ignored above
!**/.gitkeep
!.ci/
!data/processed/**/*.zip
!notebooks/*.py
!notebooks/*.ipynb
!*requirements.txt
# dotenv
.env
.envrc
# OS Junk
._*
.DS_Store
Thumbs.db
settings.json
### Python Ignores
__pycache__
# pyenv
.python-version
# virtualenv
.venv
env/
ENV/
venv/
# IPython Notebook
.ipynb_checkpoints
*/.ipynb_checkpoints/*
# Unit test / coverage reports
*.cover
*.pytest_cache/
.coverage
.coverage.*
.cache
.cache/
.hypothesis/
.nox/
.tox/
coverage.xml
htmlcov/
nosetests.xml
test-output.xml
# Byte-compiled / optimized / DLL files
**/__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
.installed.cfg
MANIFEST
*.egg-info/
*.egg
*.whl
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# PyBuilder
target/
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Sphinx documentation
docs/_build/
.python_packages
# IDE
.idea/
.spyproject/
.vscode/
*.swp
*.swo
---
failure-threshold: info # name of threshold level (error | warning | info | style | ignore | none)
label-schema:
maintainer: text
build-date: rfc3339
git-commit: hash
version: semver
description: text
strict-labels: false
ignored:
- DL3008 # pin versions in apt
- DL3013 # pin versions in pip
- DL3015 # use "no-install-recommends" with apt
---
default: true
# default indent = 4 spaces
MD007:
indent: 2
# MD030:
# ul_single: 3
# ul_multi: 3
# MD013/line-length - Line length
MD013:
# Number of characters
line_length: 119
# Number of characters for headings
heading_line_length: 119
# Number of characters for code blocks
code_block_line_length: 119
# Include code blocks
code_blocks: true
# Ignore table length violations
tables: false
# Include headings
headings: true
# Include headings
headers: true
# Strict length checking
strict: false
# Stern length checking
stern: false
MD024:
# Allow heading duplication if under different parent headings
siblings_only: true
---
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
exclude: |
(?x)^(
^.*(copier-answers\.ya?ml)$
)$
repos:
- repo: local
hooks:
- id: forbid-yml
name: Forbid .yml file extensions (use .yaml)
entry: YAML file extensions must be .yaml
language: fail
files: \.yml$
exclude: |
(?x)^(
^.*(copier-answers\.ya?ml)$
)$
- id: forbid-rej
name: Forbid .rej file extensions from `copier update`
entry: Forbid .rej file extensions from `copier update`
language: fail
files: \.rej$
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: "v4.4.0"
hooks:
- id: check-added-large-files
args: [--maxkb=500000]
stages: [commit]
- id: check-case-conflict
stages: [commit]
- id: check-merge-conflict
stages: [commit]
- id: check-yaml
stages: [commit]
- id: end-of-file-fixer
stages: [commit]
- id: mixed-line-ending
stages: [commit]
- id: trailing-whitespace
args: [--markdown-linebreak-ext=md]
stages: [commit]
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: "v1.5.1"
hooks:
- id: remove-crlf
stages: [commit]
- id: remove-tabs
stages: [commit]
- repo: https://github.com/sirosen/texthooks
rev: "0.5.0"
hooks:
- id: fix-smartquotes
stages: [commit]
- id: fix-ligatures
stages: [commit]
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.0.0-alpha.9-for-vscode"
hooks:
- id: prettier
stages: [commit]
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: "v0.34.0"
hooks:
- id: markdownlint
args: ["-f"]
stages: [commit]
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.32.0
hooks:
- id: yamllint
args: [-c=.yamllint.yaml]
stages: [commit]
# - repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
# rev: 0.1.0
# hooks:
# - id: yamlfmt
# args: [--mapping, '2', --sequence, '4', --offset, '2', --preserve-quotes]
- repo: https://github.com/charliermarsh/ruff-pre-commit
# Ruff version.
rev: "v0.0.271"
hooks:
- id: ruff
args: [--fix]
stages: [commit, manual]
- repo: https://github.com/psf/black
rev: "23.3.0"
hooks:
- id: black
stages: [commit, manual]
- repo: https://github.com/petalmd/dockerfile-pre-commit
rev: v1.0
hooks:
- id: dockerlint
stages: [commit]
# args: [--ignore, DL3025, --ignore, DL3018]
### NOTE: ".typos.toml" may be required to have _already been committed_
### in order for typos pre-commit hook to read it.
### In this case, comment out this block until ".typos.toml" has been committed successfully
- repo: https://github.com/crate-ci/typos
rev: v1.14.12
hooks:
- id: typos
# args: []
args: ["--config", ".typos.toml"]
exclude: |
(?x)^(
^.*(typos\.toml)$
)$
stages: [commit]
# prettier should ignore
# .azureml
# .eggs
# .ipynb_checkpoints
# .nox
# .obsidian
# .pytest_cache
# .tox
.vscode/
.**/
dist/
*.md
# ignore local .env from versioning
.env
.envrc
---
trailingComma: "es5"
tabWidth: 2
semi: false
singleQuote: false
quoteProps: "consistent"
printWidth: 100
proseWrap: "always"
overrides:
- files: "*.md"
options:
parser: "markdown"
# proseWrap: "preserve"
- files: "*.yaml"
options:
parser: "yaml"
proseWrap: "preserve"
### Allow shellcheck to follow arbitrary file paths in `source` statements
enable=external-sources
### optional config
# enable=add-default-case
enable=avoid-nullary-conditions
enable=check-unassigned-uppercase
enable=deprecate-which
enable=quote-safe-variables
# enable=require-double-brackets
# enable=require-variable-braces
### Specific rule customizations
# disable=SC1071 # disable error on zsh shebang
### Configuration for typos pre-commit
# https://github.com/crate-ci/typos/blob/master/docs/reference.md
[files]
# glob/gitignore-style file exclusions
extend-exclude = [
"_typos.toml",
".typos.toml",
"typos.toml",
]
ignore-hidden = false # ignore hidden files/dirs
ignore-files = true # respect ignore files
ignore-dot = true # respect ignore files
ignore-vcs = true # ignore version-control directories
ignore-global = true # respect global ignore files
[default]
binary = false
check-filename = true
check-file = true
unicode = true
ignore-hex = true
identifier-leading-digits = false
locale = "en"
## Custom uncorrectable sections (e.g. markdown code fences, PGP signatures, etc)
## list of regexes
extend-ignore-re = []
## Pattern-match always-valid identifiers
## list of regexes
extend-ignore-identifiers-re = []
## Corrections for identifiers (https://www.unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers).
## When the correction is blank, the identifier is *never* valid.
## When the correction is the key, the identifier is *always* valid.
[default.extend-identifiers]
# capwords are identifiers
AKE = "AKE"
AKS = "AKS"
## Corrections for words.
## When the correction is blank, the word is *never* valid.
## When the correction is the key, the word is *always* valid.
[default.extend-words]
keypair = "keypair"
mape = "mape"
## Specific rules for lockfiles
[type.lock]
extend-glob = []
binary = false
check-filename = true
check-file = false
unicode = true
ignore-hex = true
identifier-leading-digits = false
extend-ignore-identifiers-re = []
extend-ignore-re = []
[type.lock.extend-identifiers]
[type.lock.extend-words]
## Specific rules for python
[type.py]
extend-glob = []
extend-ignore-identifiers-re = []
extend-ignore-re = []
[type.py.extend-identifiers]
NDArray = "NDArray"
[type.py.extend-words]
---
ignore: |
.vscode/
extends: default
rules:
braces:
min-spaces-inside: 0
max-spaces-inside: 1
brackets:
min-spaces-inside: 0
max-spaces-inside: 0
comments:
min-spaces-from-content: 1 # align prettier: https://github.com/prettier/prettier/pull/10926
comments-indentation: disable
indentation:
spaces: consistent
indent-sequences: consistent
check-multi-line-strings: false
line-length: disable
# quoted-strings:
# quote-type: any
# required: only-when-needed
truthy:
allowed-values: ["true", "false"]
# --- project ----------------------------------------------------------------
[project]
name = "<PACKAGE_NAME>"
authors = [
{ name="", email="" },
]
description = "..."
dynamic = ["version"]
readme = "README.md"
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Operating System :: OS Independent",
]
requires-python = ">=3.8"
dependencies = [
"numpy>=1.21",
"pandas>=1.5",
"scikit-learn",
"scipy>=1.7",
"azure-identity",
"azure-keyvault-secrets",
"python-dateutil",
"pyodbc",
"pyyaml",
"redis>=4.5.5",
"snowflake-connector-python",
"snowflake-sqlalchemy",
"sqlalchemy",
]
[project.optional-dependencies]
docs = [
"numpydoc",
"sphinx",
"sphinx_rtd_theme",
]
format = [
"black",
"black-jupyter",
"ruff",
]
test = [
"coverage>=4.2.0",
"pytest",
"pytest-asyncio",
"pytest-cov",
]
dev = [
"<PACKAGE_NAME>[doc]",
"<PACKAGE_NAME>[format]",
"<PACKAGE_NAME>[test]",
]
pipelines = [
"<PACKAGE_NAME>[test]",
"pytest-azurepipelines"
]
[project.urls]
"repository" = "https://github.com/<OWNER>/<REPO>"
# ref: https://hatch.pypa.io/1.2/version/#configuration
[tool.hatch.version]
path = "./src/VERSION"
pattern = "^(?P<version>.+?)(\n)"
[tool.hatch.build]
only-include = [
"./src/VERSION",
"src/<IMPORT_NAME>",
"tests",
]
skip-excluded-dirs = true
# sources = ["src"]
[tool.hatch.build.targets.sdist]
[tool.hatch.build.targets.wheel]
packages = ["src/{{ python_import_name }}"]
macos-max-compat = true
# --- build-system -----------------------------------------------------------
# ref: https://packaging.python.org/en/latest/tutorials/packaging-projects/
# these should match the "setup-requires" packages in `setup.cfg`
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
# --- black ------------------------------------------------------------------
# ref: https://black.readthedocs.io/en/stable/usage_and_configuration/the_basics.html#configuration-via-a-file
[tool.black]
target-version = ['py39']
line-length = 100
# target-version = ['py39']
include = '\.pyi?$|\.ipynb$'
extend-exclude = '''
# # A regex preceded with ^/ will apply only to files and directories
# # in the root of the project.
# ^/foo.py # exclude a file named foo.py in the root of the project (in addition to the defaults)
\.ipynb_checkpoints$|^/\.env|^/\.git|^/\.nox|^/\.pytest_cache|^/\.tox
'''
# --- ruff -------------------------------------------------------------------
[tool.ruff]
select = [
'A', # flake8 builtins
'B', # flake8 bugbear
'C4', # flake8 comprehensions
'C90', # mccabe
'D', # pydocstyle
'E', # pycodestyle
'F', # pyflakes
'I', # isort
'N', # pep8-naming
# 'PTH', # flake8-use-pathlib
'Q', # flake8-quotes
'S', # bandit
'SIM', # flake8-simplify
'TRY', # tryceratops
'W', # pycodestyle
# 'T20', # flake8 print
]
# Avoid trying to fix extension types:
unfixable = ["B"]
ignore = [
"B905", # zip strict=True; remove once python <3.10 support is dropped.
"D100", # do not require module-level docstrings
"D104", # do not require package docstrings
"D107", # do not require docstrings in __init__ files
"D205", # don't require linebreak after docstring (ruff vs black conflict)
# "E203", # not in ruff
# "E265", # not in ruff
# "E266", # not in ruff
"E501", # line too long
"F401", # unused import
"F403", # import *
"F405", # defined from import *
# "F541", # f-string missing placeholders
"N999", # allow "invalid" module names due to jinja templates "S101", # assert
"SIM105", # allow except: pass
"TRY003", # Avoid specifying messages outside exception class; overly strict, especially for ValueError
"TRY201", # Allow raise without exception name (align with Sonarlint)
# "W503", # not in ruff
]
exclude = [
"*.egg-info",
".direnv",
".eggs",
".env",
".envrc",
".git",
".ipynb_checkpoints",
".nox",
".pytest_cache",
".ruff_cache",
".tox",
".venv",
"__pypackages__",
"_build",
"ci/templates",
"build",
"dist",
"docs/conf.py",
"venv",
]
# Default autofix behavior
fix = true
# Max line length
line-length = 119
# Directories with source code
src = ["notebooks", "src", "tests"]
# Assumed Python version
target-version = "py39"
[tool.ruff.per-file-ignores]
# # Ignore `E402` (import violations) in all `__init__.py` files,
# # and in `path/to/file.py`.
# "__init__.py" = ["E402"]
# "path/to/file.py" = ["E402"]
".ci/*" = ["D"]
"docs/*" = ["D"]
"notebooks/*" = ["B018", "D", "S101"]
"tests/*" = ["D", "S101"]
# --- ruff plugins --------------------
[tool.ruff.flake8-bugbear]
extend-immutable-calls = [
"chr",
"typer.Argument",
"typer.Option",
]
[tool.ruff.isort]
combine-as-imports = true
# extra-standard-library = ["path"]
forced-separate = ["scipy", "sklearn", "statsmodels", "ds_utils", "src"]
force-sort-within-sections = true
force-wrap-aliases = true
known-first-party = ["ds_utils", "src"]
# known-local-folder = ["src"] # for relative imports
[tool.ruff.mccabe]
max-complexity = 18
[tool.ruff.pep8-naming]
ignore-names = []
[tool.ruff.pydocstyle]
convention = "numpy"
# --- pytest -----------------------------------------------------------------
# ref: https://docs.pytest.org/en/7.3.x/reference/customize.html
[tool.pytest.ini_options]
addopts = '''
-ra
--strict-markers
--ignore=docs/conf.py
--ignore=setup.py
--ignore=ci
--ignore=.eggs
--tb=short
'''
# --doctest-modules
# --doctest-glob=\*.rst
norecursedirs = [
".env",
".git",
".nox",
".pytest_cache",
".tox",
"__pycache__",
"dist",
"docs",
"build",
"migrations",
"notebooks",
"writeup",
]
python_files = [
"test_*.py",
"*_test.py",
"tests.py",
]
pythonpath = "src"
testpaths = [
"tests",
]
# log_cli = true
# --- coverage ---------------------------------------------------------------
[tool.coverage.paths]
source = ["src", "*/site-packages"]
[tool.coverage.run]
branch = true
source = ["src"]
[tool.coverage.report]
show_missing = true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment