Jupyter on DSE

On your client machine

As the `root` user

Install DSE
In the cassandra.yml file, ensure the datacenter and cluster match your analytics datacenter
In the cassandra-env.sh file add this configuration line toward the bottom JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" This will make your DSE node a coordinator only, it will not own any data. You can use this node to submit jobs to DSE locally without the need to know which is the master node.
start DSE
Install python
Install virtualenv

As the `cassandra` user

> virtualenv .jupyter
> source .jupyter/bin/activate
> pip install ipython
> pip install jupyter
> PYSPARK_SUBMIT_ARGS="$PYSPARK_SUBMIT_ARGS pyspark-shell" IPYTHON_OPTS="notebook --ip='*' --no-browser" dse pyspark

Notes

You can use something like supervisord to keep jupyter running in the background.

If you are getting a permission denied error when starting pyspark that look slike this: OSError: [Errno 13] Permission denied: '/run/user/505/jupyter' It is because the XDG_RUNTIME_DIR is set to your logged in user, in that case just add the following environment variable before starting pyspark: JUPYTER_RUNTIME_DIR="$HOME/.jupyter/runtime

devdazed/gist:749d046b1a1da8869d68

On your client machine

As the root user

As the cassandra user

Notes

As the `root` user

As the `cassandra` user