esi_cluster_setup(partition='8GBS', n_jobs=2, mem_per_job=None, timeout=180, interactive=True, start_client=True, **kwargs)¶
Start a distributed Dask cluster of parallel processing workers using SLURM (or local multi-processing)
partition (str) – Name of SLURM partition/queue to use
n_jobs (int) – Number of jobs to spawn
mem_per_job (None or str) – Memory booking for each job. Can be specified either in megabytes (e.g.,
mem_per_job = 1500MB) or gigabytes (e.g.,
mem_per_job = "2GB"). If mem_per_job is None, it is attempted to infer a sane default value from the chosen queue, e.g., for
partition = "8GBS"mem_per_job is automatically set to the allowed maximum of ‘8GB’. However, even in queues with guaranted memory bookings, it is possible to allocate less memory than the allowed maximum per job to spawn numerous low-memory jobs. See Examples for details.
timeout (int) – Number of seconds to wait for requested jobs to start up.
interactive (bool) – If True, user input is required in case not all jobs could be started in the provided waiting period (determined by timeout). If interactive is False and the jobs could not be started within timeout seconds, a TimeoutError is raised.
start_client (bool) – If True, a distributed computing client is launched and attached to the workers. If start_client is False, only a distributed computing cluster is started to which compute-clients can connect.
**kwargs (dict) – Additional keyword arguments can be used to control job-submission details.
proc – A distributed computing client (if
start_client = True) or a distributed computing cluster (otherwise).
- Return type
The following command launches 10 SLURM jobs with 2 gigabytes memory each in the 8GBS partition
>>> spy.esi_cluster_setup(n_jobs=10, partition="8GBS", mem_per_job="2GB")
If you want to access properties of the created distributed computing client, assign an explicit return quantity, i.e.,
>>> client = spy.esi_cluster_setup(n_jobs=10, partition="8GBS", mem_per_job="2GB")
The underlying distributed computing cluster can be accessed using
Syncopy’s parallel computing engine relies on the concurrent processing library Dask. Thus, the distributed computing clients used by Syncopy are in fact instances of
dask.distributed.Client. This function specifically acts as a wrapper for
dask_jobqueue.SLURMCluster. Users familiar with Dask in general and its distributed scheduler and cluster objects in particular, may leverage Dask’s entire API to fine-tune parallel processing jobs to their liking (if wanted).
remove dangling parallel processing job-clusters