Skip to content

Instantly share code, notes, and snippets.

@biggers
Created August 18, 2024 13:14
Show Gist options
  • Save biggers/919fcd0c1c3a7f220a2994a6551ed2ab to your computer and use it in GitHub Desktop.
Save biggers/919fcd0c1c3a7f220a2994a6551ed2ab to your computer and use it in GitHub Desktop.
Bootstrapping a working Jupyter notebook environment for Spark/ML book (python)
## the git-repo contains the example code and solutions to the exercises in O'Reilly book:
##
## "Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch" 2023 by Adi Polak
# You will need a Python install using "anaconda" by means of Mamba:
# https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html
git clone https://github.com/adipolak/ml-with-apache-spark.git
mamba create -n spark-jupyter-openjdk8-py3 -c conda-forge python=3.11 jupyter notebook openjdk=8 findspark
mamba activate spark-jupyter-openjdk8-py3
mamba install pyspark
cd ml-with-apache-spark
# put url-token-link in your fav browser
# browse to a book-example Notebook
jupyter-notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment