Created
September 23, 2022 13:53
-
-
Save jwkidd3/8e511c7745776eafe11582035eec9ef7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* All data can be found at s3://databrickssql/ | |
* Create the following databases: | |
* {UNIQUE_IDENTIFIER}_retail | |
* {UNIQUE_IDENTIFIER}_nyse | |
* Number of orders placed every month. We have to consider only COMPLETE as well as CLOSED orders. Make sure output is sorted by month. | |
* Database: retail | |
* Table: orders | |
* Number of orders placed in the months of 2013 August, 2013 September and 2013 October. We have to consider all PENDING orders (PENDING, PENDING_PAYMENT) | |
* Database: retail | |
* Table: orders | |
* Get revenue for each order (using order_items). Output should contain order id and revenue, revenue should be rounded off to 2 digits. Also output should be saved to a new table called order_revenue (file format parquet). | |
* Source Database: retail | |
* Source Table: order_items | |
* Target Database: {UNIQUE_IDENTIFIER}_retail | |
* Target Table: order_revenue | |
* Reference for [Creating Tables using Spark SQL](https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html) | |
* Create tables for products, categories and departments, then load the data into all the tables. Use the below data model to come up with column names. | |
* Base Directory for all data sets **/data/retail_db** | |
* File Format: TEXTFILE | |
* Delimiter: Comma "," | |
* Create a new table for NYSE Data. | |
* Database Name: **{UNIQUE_IDENTIFIER}_nyse** | |
* Table Name: **nyse_eod** | |
* Type: **EXTERNAL** | |
* File Type: **TEXTFILE** (default) | |
* Location: **s3://databrickssql/nyse_data/** | |
* Fields: **stockticker, tradedate, openprice, highprice, lowprice, closeprice, volume** | |
* Data Types: **Prices are of type float, stockticker is of type string, tradedate is of type int and volume is of type bigint.** | |
* Field Delimiter: **Comma (,)** | |
* Run `SELECT * FROM ` the table name to ensure data is loaded properly. Also run the count to make sure that all the data is loaded into the table. | |
* Compute Daily Product Revenue | |
* Source Database: **retail** | |
* Consider only COMPLETE and CLOSED orders | |
* Output should contain - order_date, product_name, revenue based on order_item_subtotal | |
* Round off revenue to 2 digits | |
* Save output to daily_product_revenue (sort the data by order_date in ascending and revenue in descending). | |
* Target Database: **{UNIQUE_IDENTIFIER}_retail** | |
* Target File Format: TEXTFILE | |
* Get the details of orders where there are no records in order_items. | |
* Source Database: retail | |
* Tables: orders and order_items | |
* Data Validation: Validate if order_item_subtotal is equal to product of order_item_product_price and order_item_quantity. | |
* Source Database: retail | |
* Source Table: order_items | |
* Get the order count for each day and status sorted by order_date in ascending order and then by order_status where data is sorted COMPLETE, CLOSED and rest in ascending order. | |
* Source Database: retail | |
* Source Table: orders | |
* Create FACT Table(s) for the following reports: | |
* Daily Product Revenue | |
* Daily Category Revenue | |
* Daily Department Revenue | |
* Roll up to Weekly, Monthly, Yearly | |
* Support To Date Reports |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment