This exercise consists of developing a distributed Extract-Transform-Load (ETL) application.
Your application should ingest the data from a source relational database system and use a distributed data processing tool such as Apache Hadoop or Apache Spark to compute some statistics and output them in a form that can be loaded into some destination storage system for consumption.
Please write your application in Python, Java, Scala or Kotlin. The application must be buildable from the command line; it should not require an IDE to build or run.
The exercise should generally not take more than 3 or 4 hours, although you're free to take as much time as you'd like to work on it. If you don't finish within a few hours, that's okay; submit what you've got anyway.