#create a dataframe from filepaths
directory = '/path/on/dbfs'
file_paths = dbutils.fs.ls(directory)
#e.g
print(file_paths[0].path)
print(file_paths[0].name)
files_df = spark.createDataFrame(map(lambda path: (path.path,path.name), file_paths), ["path","name"])
Created
July 7, 2021 18:41
-
-
Save lindacmsheard/aae4548db888fc192002b0ff05844f99 to your computer and use it in GitHub Desktop.
Create a pyspark dataframe of filepaths by reading a directory in databricks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment