syncopy.shared.computational_routine.ComputationalRoutine¶
-
class
syncopy.shared.computational_routine.
ComputationalRoutine
(*argv, **kwargs)[source]¶ Abstract class for encapsulating sequential/parallel algorithms
A Syncopy compute class consists of a
ComputationalRoutine
-subclass that binds a staticcomputeFunction()
and provides the class methodprocess_metadata()
.Requirements for
computeFunction()
:First positional argument is a
numpy.ndarray
, the keywords chunkShape and noCompute are supportedReturns a
numpy.ndarray
if noCompute is False and expected shape and numerical type of output array otherwise.
Requirements for
ComputationalRoutine
:Child of
ComputationalRoutine
, bindscomputeFunction()
as static methodProvides class method
process_metadata()
For details on writing compute classes and metafunctions for Syncopy, please refer to Design Guide: Syncopy Compute Classes.
-
__init__
(*argv, **kwargs)[source]¶ Instantiate a
ComputationalRoutine
subclass- Parameters
*argv (tuple) – Tuple of positional arguments passed on to
computeFunction()
**kwargs (dict) – Keyword arguments passed on to
computeFunction()
- Returns
obj – Usable class instance for processing Syncopy data objects.
- Return type
instance of
ComputationalRoutine
-subclass
Methods
__init__
(*argv, **kwargs)Instantiate a
ComputationalRoutine
subclasscompute
(data, out[, parallel, …])Central management and processing method
computeFunction
(arr, *argv[, chunkShape, …])Computational core routine
compute_parallel
(data, out)Concurrent computing kernel
compute_sequential
(data, out)Sequential computing kernel
initialize
(data[, chan_per_worker, keeptrials])Perform dry-run of calculation to determine output shape
preallocate_output
(out[, parallel_store])Storage allocation and provisioning
process_metadata
(data, out)Meta-information manager
write_log
(data, out[, log_dict])Processing of output log
-
static
computeFunction
(arr, *argv, chunkShape=None, noCompute=None, **kwargs)[source]¶ Computational core routine
- Parameters
arr (
numpy.ndarray
) – Numerical data from a single trial*argv (tuple) – Arbitrary tuple of positional arguments
chunkShape (None or tuple) – Mandatory keyword. If not None, represents global block-size of processed trial.
noCompute (None or bool) – Preprocessing flag. If True, do not perform actual calculation but instead return expected shape and
numpy.dtype
of output array.**kwargs (dict) – Other keyword arguments.
- Returns
out Shape (tuple, if
noCompute == True
) – expected shape of output arrayoutDtype (
numpy.dtype
, ifnoCompute == True
) – expected numerical type of output arrayres (
numpy.ndarray
, ifnoCompute == False
) – Result of processing input arr
Notes
This concrete method is a placeholder that is intended to be overloaded.
See also
ComputationalRoutine()
Developer documentation: Design Guide: Syncopy Compute Classes.
-
__init__
(*argv, **kwargs)[source]¶ Instantiate a
ComputationalRoutine
subclass- Parameters
*argv (tuple) – Tuple of positional arguments passed on to
computeFunction()
**kwargs (dict) – Keyword arguments passed on to
computeFunction()
- Returns
obj – Usable class instance for processing Syncopy data objects.
- Return type
instance of
ComputationalRoutine
-subclass
-
initialize
(data, chan_per_worker=None, keeptrials=True)[source]¶ Perform dry-run of calculation to determine output shape
- Parameters
data (syncopy data object) – Syncopy data object to be processed (has to be the same object that is passed to
compute()
for the actual calculation).chan_per_worker (None or int) – Number of channels to be processed by each worker (only relevant in case of concurrent processing). If chan_per_worker is None (default) by-trial parallelism is used, i.e., each worker processes data corresponding to a full trial. If chan_per_worker > 0, trials are split into channel-groups of size chan_per_worker (+ rest if the number of channels is not divisible by chan_per_worker without remainder) and workers are assigned by-trial channel-groups for processing.
keeptrials (bool) – Flag indicating whether to return individual trials or average
- Returns
Nothing
- Return type
Notes
This class method has to be called prior to performing the actual computation realized in
computeFunction()
.See also
compute()
core routine performing the actual computation
-
compute
(data, out, parallel=False, parallel_store=None, method=None, mem_thresh=0.5, log_dict=None, parallel_debug=False)[source]¶ Central management and processing method
- Parameters
data (syncopy data object) – Syncopy data object to be processed (has to be the same object that was used by
initialize()
in the pre-calculation dry-run).out (syncopy data object) – Empty object for holding results
parallel (bool) – If True, processing is performed in parallel (i.e.,
computeFunction()
is executed concurrently across trials). If parallel is False,computeFunction()
is executed consecutively trial after trial (i.e., the calculation realized incomputeFunction()
is performed sequentially).parallel_store (None or bool) – Flag controlling saving mechanism. If None,
parallel_store = parallel
, i.e., the compute-paradigm dictates the employed writing method. Thus, in case of parallel processing, results are written in a fully concurrent manner (each worker saves its own local result segment on disk as soon as it is done with its part of the computation). If parallel_store is False and parallel is True the processing result is saved sequentially using a mutex. If both parallel and parallel_store are False standard single-process HDF5 writing is employed for saving the result of the (sequential) computation.method (None or str) – If None the predefined methods
compute_parallel()
orcompute_sequential()
are used to control the actual computation (specifically, callingcomputeFunction()
) depending on whether parallel is True or False, respectively. If method is a string, it has to specify the name of an alternative (provided) class method that is invoked using getattr.mem_thresh (float) – Fraction of available memory required to perform computation. By default, the largest single trial result must not occupy more than 50% (
mem_thresh = 0.5
) of available single-machine or worker memory (if parallel is False or True, respectively).log_dict (None or dict) – If None, the log properties of out is populated with the employed keyword arguments used in
computeFunction()
. Otherwise, out’s log properties are filled with items taken from log_dict.parallel_debug (bool) – If True, concurrent processing is performed using a single-threaded scheduler, i.e., all parallel computing task are run in the current Python thread permitting usage of tools like pdb/ipdb, cProfile and the like in
computeFunction()
. Note that enabling parallel debugging effectively runs the given computation on the calling local machine thereby requiring sufficient memory and CPU capacity.
- Returns
Nothing – The result of the computation is available in out once
compute()
terminated successfully.- Return type
Notes
This routine calls several other class methods to perform all necessary pre- and post-processing steps in a fully automatic manner without requiring any user-input. Specifically, the following class methods are invoked consecutively (in the given order):
preallocate_output()
allocates a (virtual) HDF5 dataset of appropriate dimension for storing the resultcompute_parallel()
(orcompute_sequential()
) performs the actual computation via concurrently (or sequentially) callingcomputeFunction()
process_metadata()
attaches all relevant meta-information to the result out after successful termination of the calculationwrite_log()
stores employed input arguments in out.cfg and out.log to reproduce all relevant computational steps that generated out.
See also
initialize()
pre-calculation preparations
preallocate_output()
storage provisioning
compute_parallel()
concurrent computation using
computeFunction()
compute_sequential()
sequential computation using
computeFunction()
process_metadata()
management of meta-information
write_log()
log-entry organization
-
preallocate_output
(out, parallel_store=False)[source]¶ Storage allocation and provisioning
- Parameters
out (syncopy data object) – Empty object for holding results
parallel_store (bool) – If True, a directory for virtual source files is created in Syncopy’s temporary on-disk storage (defined by syncopy.__storage__). Otherwise, a dataset of appropriate type and shape is allocated in a new regular HDF5 file created inside Syncopy’s temporary storage folder.
- Returns
Nothing
- Return type
See also
compute()
management routine controlling memory pre-allocation
-
compute_parallel
(data, out)[source]¶ Concurrent computing kernel
- Parameters
data (syncopy data object) – Syncopy data object to be processed
out (syncopy data object) – Empty object for holding results
- Returns
Nothing
- Return type
Notes
This method mereley acts as a concurrent wrapper for
computeFunction()
by passing along all necessary information for parallel execution and storage of results using a dask bag of dictionaries. The actual reading of source data and writing of results is managed by the decoratorsyncopy.shared.parsers.unwrap_io()
. Note that this routine first builds an entire parallel instruction tree and only kicks off execution on the cluster at the very end of the calculation command assembly.See also
compute()
management routine invoking parallel/sequential compute kernels
compute_sequential()
serial processing counterpart of this method
-
compute_sequential
(data, out)[source]¶ Sequential computing kernel
- Parameters
data (syncopy data object) – Syncopy data object to be processed
out (syncopy data object) – Empty object for holding results
- Returns
Nothing
- Return type
Notes
This method most closely reflects classic iterative process execution: trials in data are passed sequentially to
computeFunction()
, results are stored consecutively in a regular HDF5 dataset (that was pre-allocated bypreallocate_output()
). Since the calculation result is immediately stored on disk, propagation of arrays across routines is avoided and memory usage is kept to a minimum.See also
compute()
management routine invoking parallel/sequential compute kernels
compute_parallel()
concurrent processing counterpart of this method
-
write_log
(data, out, log_dict=None)[source]¶ Processing of output log
- Parameters
data (syncopy data object) – Syncopy data object that has been processed
out (syncopy data object) – Syncopy data object holding calculation results
log_dict (None or dict) – If None, the log properties of out is populated with the employed keyword arguments used in
computeFunction()
. Otherwise, out’s log properties are filled with items taken from log_dict.
- Returns
Nothing
- Return type
See also
process_metadata()
Management of meta-information
-
abstract
process_metadata
(data, out)[source]¶ Meta-information manager
- Parameters
data (syncopy data object) – Syncopy data object that has been processed
out (syncopy data object) – Syncopy data object holding calculation results
- Returns
Nothing
- Return type
Notes
This routine is an abstract method and is thus intended to be overloaded. Consult the developer documentation (Design Guide: Syncopy Compute Classes) for further details.
See also
write_log()
Logging of calculation parameters