Created
December 2, 2022 10:32
-
-
Save joanteixi/17fdc65b705a93a14a80f5f1483616ce to your computer and use it in GitHub Desktop.
Add databriks structure to field when it is null and not null in some rows.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def addStructure(df, field, structFields): | |
''' | |
Receive a df, a field and structure in a dictionary. | |
''' | |
# compruebas si todos los valores son nulos, pq cuando lo son no funciona el replace de null por una estructura. | |
validator = df.select(df[field]).distinct() | |
if validator.schema[field].dataType.typeName() == 'string': | |
otherwiseField = None | |
else: | |
otherwiseField = df[field] | |
# crear la estructura qeur deberían tener esos campos nulos. | |
for k in structFields: | |
listStruct = [f.lit('').alias(item) for item in structFields] | |
jsonFields = f.struct(*listStruct) | |
df = df.withColumn(field, f.when(df[field].isNull(), jsonFields).otherwise(otherwiseField) | |
return df | |
''' | |
Use case: | |
* have a dataframe with employee fields as structure: | |
- struct(Name: string, surname: string) | |
* we want to flatten this fields, and dont' have nulls or almost have the columns created when all rows are null for field employee | |
df = addStructure(df, 'employee', ['name', 'surname']) | |
return df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment