Skip to content

Instantly share code, notes, and snippets.

@timothyrenner
Created January 30, 2019 15:57
Show Gist options
  • Save timothyrenner/3d5968015dd1123e388e74659749eedc to your computer and use it in GitHub Desktop.
Save timothyrenner/3d5968015dd1123e388e74659749eedc to your computer and use it in GitHub Desktop.
Pyspark UDF Definition
import numpy as np
def predict(*features):
""" Performs a prediction on the features.
Parameters
----------
features : List[float]
The feature values the model needs to make a prediction.
Returns
-------
float
The predicted score.
"""
# Turn the features into a 1xN numpy array.
np_features = np.array([features])
# Assume the model is in scope. Spark will serialize and distribute.
# Note I have to convert from numpy's float type to a native
# Python float.
return model.predict(np_features)[0].item()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment