- install
pyarrow
import pyarrow.parquet as pq
fhv_2019_01 = r"/home/nervuzz/tmp/fhv_tripdata_2019-01.parquet"
fhv_2019_12 = r"/home/nervuzz/tmp/fhv_tripdata_2019-12.parquet"
-- Read about DataTalks.Club Data Engineering Zoomcamp --
First week of the data engineering Zoomcamp by DataTalks.Club was a gentle introduction to writing and executing SQL
queries against the PostgreSQL
database.
Although pgcli
is not 100% necessary for the Week 1 completion, but it makes writing SQL queries more pleasant, so I decided to try reproducing errors that MAY appear while installing pgcli and / or using it.
I've set-up my env using WSL2 (Ubuntu-20.04)
in Windows 10 (21H1)
, Python 3.9.10
.
-- Read about DataTalks.Club Data Engineering Zoomcamp --
Second week of the data engineering Zoomcamp by DataTalks.Club brought a new tool that is one of the most popular data pipeline platforms - Apache Airflow
. So we are going to create some workflows!
First you have to run the Docker compose Airflow installation in the environment of our choice, which can be one of but not limited to MacOS
, Linux
, GCP VM
or very popular WSL
.
What's more, we also need the Google Cloud SDK
installed in our Airflow env in order to connect with the Cloud Store bucket & create tables in Big Query.
That means we cannot just use the official docker-compose.yaml
referenced in the Airflow's docs, but we have to build custom Dockerfile
with an extended apache/airflow
image containing our additional dependencies. Then we can incorporate it into docker-compose.yaml
🙌