Skip to content

Instantly share code, notes, and snippets.

@cesar1091
Created October 2, 2022 03:07
Show Gist options
  • Save cesar1091/e7351519be20205dc16515d7baa98c62 to your computer and use it in GitHub Desktop.
Save cesar1091/e7351519be20205dc16515d7baa98c62 to your computer and use it in GitHub Desktop.
customer = spark.read.format("parquet").load("/user/vagrant/lab1/pregunta2/customer")
customer.createOrReplaceTempView("customer")
order_items = spark.read.format("orc").load("/user/vagrant/lab1/pregunta9/resultado")
order_items.createOrReplaceTempView("order_items")
top_customer = spark.sql("select customer_id, customer_fname, count(*) as cant,
sum(order_item_subtotal) as total
from customer a inner join orders b
on a.customer_id = b.order_customer_id inner
join order_items c
on c.order_item_order_id = b.order_id where
customer_city like 'M%' group by customer_id, customer_fname")
top_customer.write.format("parquet").option("compression","gzip").save("/user/vagrant/lab1/pregunta10/resultado")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment