Skip to content

Instantly share code, notes, and snippets.

@feluelle
Last active March 16, 2021 22:05
Show Gist options
  • Save feluelle/7886d695e01ca9095e5916d9240059e6 to your computer and use it in GitHub Desktop.
Save feluelle/7886d695e01ca9095e5916d9240059e6 to your computer and use it in GitHub Desktop.
My Apache Airflow docker-compose file for running LocalExecutor with postgres using official production Dockerfile
x-airflow-environment: &airflow-environment
environment:
- HOST_HOME=${HOME}
- HOST_PROJECT_PATH=${PWD}
env_file: airflow.env
image: feluelle/airflow:latest
x-airflow-volumes: &airflow-volumes
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ~/.aws:/home/airflow/.aws
- ../../1-orchestration:/opt/airflow/1-orchestration
- ../../2-expectations:/opt/airflow/2-expectations
version: '3.8'
services:
airflow-db:
image: library/postgres:latest
container_name: airflow_db
env_file: postgres.env
ports:
- 35432:5432
restart: always
airflow-db-init:
<<: *airflow-environment
build:
context: https://github.com/apache/airflow.git
args:
PYTHON_BASE_IMAGE: python:3.8-slim-buster
PYTHON_MAJOR_MINOR_VERSION: 3.8
AIRFLOW_EXTRAS: amazon,postgres,google,docker,virtualenv
container_name: airflow_db_init
command: db upgrade
depends_on:
- airflow-db
airflow-users-init:
<<: *airflow-environment
container_name: airflow_users_init
command: users create --role Admin --username airflow --email airflow@apache.org --firstname Airflow --lastname Apache --password airflow
depends_on:
- airflow-db-init
airflow-webserver:
<<: *airflow-environment
container_name: airflow_webserver
command: webserver
ports:
- 38080:8080 # airflow webserver
# TODO: Remove great_expectations port after moving from python dep to great_expectations docker image
- 38888:8888 # great_expectations jupyter notebooks
<<: *airflow-volumes
depends_on:
- airflow-db-init
restart: always
airflow-scheduler:
<<: *airflow-environment
container_name: airflow_scheduler
command: scheduler
<<: *airflow-volumes
depends_on:
- airflow-db-init
restart: always
@feluelle
Copy link
Author

feluelle commented Jun 16, 2020

Additional notes:

  • using Airflow (1-orchestration) for ELT with great_expectations (2-expectations) and dbt (3-transformations)
  • using AWS System Manager - Parameter Store for Airflow variables and connections (using volume .aws folder to authenticate)
  • using my edited requirements due to latest Airflow and Great Expectations incompatibilities of marshmallow dependency. (ge uses marshmallow>3.0 and airflow uses marshmallow==2.21.0)
  • using dind - NOTE that you need to add permission to /var/run/docker.sock to be able to use it from inside the airflow containers.
  • accessing dbt via Docker, for example:
task_dbt_test = DockerOperator(
    image='fishtownanalytics/dbt:0.17.0',
    environment=dict(
        DBT_PROFILES_DIR='/usr/.dbt'
    ),
    volumes=[
        f'{getenv("HOST_HOME")}/.dbt:/usr/.dbt',
        f'{getenv("HOST_ENVIRONMENT_PATH")}/3-transformations/examples/dbt_starter:/usr/app'
    ],
    network_mode='0-environment_default',
    auto_remove=True,
    command='test',
    task_id='dbt_test',
    dag=dag
)

@feluelle
Copy link
Author

In airflow.env I have also the following env variables:

AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend
# TODO: Remove "GE_JUPYTER_CMD" env variable after moving from python dep to great_expectations docker image
GE_JUPYTER_CMD='jupyter notebook --ip 0.0.0.0'

The GE_JUPYTER_CMD is needed to get great_expecations docs working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment