Create a parallel profile
ipython profile create --parallel --profile=slurm
cd into ~/.ipython/profile_slurm/
and add (edit) the following files:
ipcontroller_config.py
:
c.HubFactory.ip = u'*'
c.HubFactory.registration_timeout = 600
ipengine_config.py
:
c.IPEngineApp.wait_for_url_file = 300
c.EngineFactory.timeout = 300
ipcluster_config.py
:
c.IPClusterStart.controller_launcher_class = 'SlurmControllerLauncher'
c.IPClusterEngines.engine_launcher_class = 'SlurmEngineSetLauncher'
c.SlurmEngineSetLauncher.batch_template = """#!/bin/sh
#SBATCH --ntasks={n}
#SBATCH --mem-per-cpu=4G
#SBATCH --job-name=ipy-engine-
srun ipengine --profile-dir="{profile_dir}" --cluster-id=""
"""
Open tmux
or screen
and run
ipcluster start --n=500 --profile=slurm
You will see something like:
2018-05-24 01:35:07.560 [IPClusterStart] Removing pid file: /gscratch/home/t-banijh/.ipython/profile_slurm/pid/ipcluster.pid
2018-05-24 01:35:07.562 [IPClusterStart] Starting ipcluster with [daemon=False]
2018-05-24 01:35:07.565 [IPClusterStart] Creating pid file: /gscratch/home/t-banijh/.ipython/profile_slurm/pid/ipcluster.pid
2018-05-24 01:35:07.569 [IPClusterStart] Starting Controller with SlurmControllerLauncher
2018-05-24 01:35:07.599 [IPClusterStart] Job submitted with job id: '8839'
2018-05-24 01:35:08.601 [IPClusterStart] Starting 500 Engines with SlurmEngineSetLauncher
2018-05-24 01:35:08.627 [IPClusterStart] Job submitted with job id: '8840'
2018-05-24 01:35:38.659 [IPClusterStart] Engines appear to have started successfully
Wait until you see: "Engines appear to have started successfully" then go to your notebook that runs on the cluster and
import ipyparallel
client = ipyparallel.Client(profile='slurm')