The following files are util functions for easier Polars and Pyspark conversion development.
Last active
April 6, 2023 05:15
-
-
Save yowainwright/7d4f97c6045d03ef13140de721daa873 to your computer and use it in GitHub Desktop.
Pyspark vs Polars Utils
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
import polars from pl | |
client = boto3.client('s3') | |
def from_csv(bucket_name, input_path): | |
data = client.get_object(Bucket=bucket_name, Key=input_path) | |
csv_bytes = data['Body'].read() | |
return pl.read_csv(csv_bytes) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import polars from pl | |
# Assumes pandas and pyspark are installed | |
def to_polars(spark_df): | |
pandas_df = spark_df.select("*").toPandas() | |
data = pandas_df.to_dict('list') | |
return pl.DataFrame(data) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
import polars from pl | |
client = boto3.client('s3') | |
def to_csv(df, bucket_name, output_path): | |
client.put_object(Body=df.write_csv(), Bucket=bucket_name, Key=output_path) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment