Run all of these commands at the command line (not in a Jupyter Notebook). The command line will have more informative error messages and if we need complete additional steps, we'll get the messages.
Spark is a framework within the Scala programming language. Scala uses the JVM (Java Virtual Machine) so you'll need install Java.
If you use homebrew:
brew install java scala apache-spark
Let's follow the directions from the documentation: https://spark.apache.org/docs/latest/api/python/getting_started/install.html
pip install pyspark
It is your choice to install pyspark
in the base/root or in the metis conda environment. Either way, the most common incompatibility issues result from pyspark
not finding java
or an incompatible version of java
.
Open a new terminal and try:
pyspark
If that is working, open ipython
in a terminal and try:
import pyspark
spark = pyspark.sql.SparkSession.builder.getOrCreate()
If that is working, open a Jupyter Notebook and try:
import pyspark
spark = pyspark.sql.SparkSession.builder.getOrCreate()
Issues with installation on local machines are often path problems. You have to explicitly to tell your computer where the location of software is. The location on a local machine varies widely based on the hardware, the operating system (OS), and installation method. Thus, specific advice is difficult to give.
- Backup the current path (in case you break it).
- Have a hypothesis and understand the goal of each command (do not random copy n' paste commands from the Interwebs).
- Frequently open a new terminal window to make sure your state is current.
- Take frequent walks to clear your mind.
By default, homebrew uses a recent version Java (something like Java 17). That might cause errors. Try an older version of Java:
brew install --cask homebrew/cask-versions/adoptopenjdk8
export JAVA_HOME='/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home/'
If that works, make sure to add the JAVA_HOME
to your bash profile.
You may want to use one of these cloud options:
- Google's Colab
- Deepnote
Error message
Could not find valid SPARK_HOME
if SPARK_HOME environment variable is not set in Jupyter Notebook.From terminal
In Jupyter notebook
Tested on Ventura 13.0.1 (22A400) M2 chip.