Skip to content

Instantly share code, notes, and snippets.

@franchb
Forked from jlln/SparkRowApply.scala
Created February 14, 2018 09:12
Show Gist options
  • Save franchb/ade144a1ba9d09cc348aa1a59fbc8e54 to your computer and use it in GitHub Desktop.
Save franchb/ade144a1ba9d09cc348aa1a59fbc8e54 to your computer and use it in GitHub Desktop.
How to apply a function to every row in a Spark DataFrame.
def findNull(row:Row):String = {
if (row.anyNull) {
val indices = (0 to row.length-1).toArray.filter(i => row.isNullAt(i))
indices.mkString(",")
}
else "-1"
}
sqlContext.udf.register("findNull", findNull _)
df = df.withColumn("MissingGroups",callUDF("findNull",struct(df.columns.map(df(_)) : _*)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment