syncopy.datatype.base_data.BaseData

syncopy.datatype.base_data.BaseData#

class syncopy.datatype.base_data.BaseData(filename=None, dimord=None, mode='r+', **kwargs)[source]#

Abstract base class for all data classes

Data classes in Syncopy manage storing array data and metadata in HDF5 and JSON files, respectively. This base class contains the fundamental functionality shared across all data classes, that is,

properties for arrays that have a corresponding HDF5 datasets (‘dataset properties’) and the associated I/O
properties for data history (BaseData.log and BaseData.cfg)
methods and properties for defining trials on the data

Further properties and methods are defined in subclasses, e.g. syncopy.AnalogData.

__init__(filename=None, dimord=None, mode='r+', **kwargs)[source]#

Keys of kwargs are the datasets from _hdfFileDatasetProperties, and kwargs must only include datasets for which a property with a setter exists.

filename + data = create HDF5 file at filename with data in it
data only

Methods

`__init__`([filename, dimord, mode])	Keys of kwargs are the datasets from _hdfFileDatasetProperties, and kwargs must only include datasets for which a property with a setter exists.
`clear`()	Clear loaded data from memory
`copy`()	Create a copy of the entire object on disk.
`definetrial`([trialdefinition, pre, post, ...])	(Re-)define trials of a Syncopy data object
`save`([container, tag, filename, overwrite])	Save data object as new `spy` container to disk (`syncopy.save_data()`)
`selectdata`([trials, channel, channel_i, ...])	Create a new Syncopy object from a selection
`show`([squeeze])	Show (partial) contents of Syncopy object

`_abc_impl`
`_check_dataset_property_discretedata`(inData)	Check DiscreteData input data for shape consistency
`_checksum_algorithm`
`_classname_to_extension`()
`_close`()	Close backing hdf5 file.
`_defaultDimord`
`_dimord`
`_filename`
`_gen_filename`()
`_get_backing_hdf5_file_handle`()	Get handle to h5py.File instance of backing HDF5 file
`_get_trial`(trialno)
`_hdfFileAttributeProperties`
`_hdfFileDatasetProperties`	properties that are mapped onto HDF5 datasets
`_infoFileProperties`	properties that are written into the JSON file and HDF5 attributes upon save
`_is_empty`()
`_lhd`
`_log`
`_log_header`
`_mode`
`_preview_trial`(trialno)
`_register_dataset`(propertyName[, inData])	Register a new dataset, so that it is handled during saving, comparison, copy and other operations.
`_reopen`()	Reattach datasets from backing hdf5 file.
`_selectionKeyWords`
`_set_dataset_property`(inData, propertyName)	Set property that is streamed from HDF dataset ('dataset property')
`_set_dataset_property_with_array_list`(...)	Set a dataset property with a list of NumPy arrays.
`_set_dataset_property_with_dataset`(inData, ...)	Set a dataset property with an already loaded HDF5 dataset
`_set_dataset_property_with_generator`(gen, ...)	Create a dataset from a generator yielding (single trial) numpy arrays.
`_set_dataset_property_with_list`(inData, ...)	Set a dataset property with a list of NumPy arrays or syncopy
`_set_dataset_property_with_ndarray`(inData, ...)	Set a dataset property with a NumPy array
`_set_dataset_property_with_none`(inData, ...)	Set a dataset property to None
`_set_dataset_property_with_spy_list`(inData, ndim)	Set the data dataset property from a list of compatible
`_set_dataset_property_with_str`(filename, ...)	Set a dataset property with a filename str
`_spwCaller`
`_stackingDim`
`_stackingDimLabel`
`_t0`	These are the (trigger) offsets
`_trialdefinition`
`_unregister_dataset`(propertyName[, ...])	Unregister and delete an additional dataset from the Syncopy data object, and optionally delete it from the backing hdf5 file.
`_update_dataset`(propertyName[, inData])	Resets an additional dataset which was already registered via `_register_dataset` to `inData`.

Attributes

`cfg`	Dictionary of previous operations on data
`container`
`dimord`	ordered list of data dimension labels
`filename`
`info`	Dictionary of auxiliary meta information
`log`	log of previous operations on data
`mode`	write mode for data, 'r' for read-only, 'w' for writable
`sampleinfo`	nTrials x 2 `numpy.ndarray` of [start, end] sample indices
`selection`	Data selection specified by `Selector`
`tag`
`trial_ids`	Index list of trials
`trialdefinition`	]]
`trialinfo`	nTrials x M `numpy.ndarray` with numeric information about each trial
`trialintervals`	nTrials x 2 `numpy.ndarray` of [start, end] times in seconds
`trials`	list-like iterable of trials

_infoFileProperties = ('dimord', '_version', '_log', 'cfg', 'info')#: properties that are written into the JSON file and HDF5 attributes upon save

_hdfFileAttributeProperties = ('dimord', '_version', '_log')#

_selectionKeyWords = ('trials',)#

_hdfFileDatasetProperties = ()#: properties that are mapped onto HDF5 datasets

_checksum_algorithm = 'openssl_sha1'#

_stackingDimLabel = None#

_spwCaller = 'BaseData.{}'#

selectdata(trials=None, channel=None, channel_i=None, channel_j=None, latency=None, frequency=None, taper=None, unit=None, eventid=None, inplace=False, clear=False, parallel=None, **kwargs)#

Create a new Syncopy object from a selection

Usage Notice

Syncopy offers two modes for selecting data:

in-place selections mark subsets of a Syncopy data object for processing via a select dictionary without creating a new object
deep-copy selections copy subsets of a Syncopy data object to keep and preserve in a new object created by selectdata()

All Syncopy metafunctions, such as freqanalysis(), support in-place data selection via a select keyword, effectively avoiding potentially slow copy operations and saving disk space. The keys accepted by the select dictionary are identical to the keyword arguments discussed below. In addition, select = "all" can be used to select entire object contents. Examples

>>> select = {"toilim" : [-0.25, 0]}
>>> spy.freqanalysis(data, select=select)
>>> # or equivalently
>>> cfg = spy.get_defaults(spy.freqanalysis)
>>> cfg.select = select
>>> spy.freqanalysis(cfg, data)

Usage Summary

List of Syncopy data objects and respective valid data selectors:

AnalogDatatrials, channel, toi/toilim

Examples

>>> spy.selectdata(data, trials=[0, 3, 5], channel=["channel01", "channel02"])
>>> cfg = spy.StructDict()
>>> cfg.trials = [5, 3, 0]; cfg.toilim = [0.25, 0.5]
>>> spy.selectdata(cfg, data)

SpectralDatatrials, channel, toi/toilim, foi/foilim, taper

Examples

>>> spy.selectdata(data, trials=[0, 3, 5], channel=["channel01", "channel02"])
>>> cfg = spy.StructDict()
>>> cfg.foi = [30, 40, 50]; cfg.taper = slice(2, 4)
>>> spy.selectdata(cfg, data)

EventDatatrials, toi/toilim, eventid

Examples

>>> spy.selectdata(data, toilim=[-1, 2.5], eventid=[0, 1])
>>> cfg = spy.StructDict()
>>> cfg.trials = [0, 0, 1, 0]; cfg.eventid = slice(2, None)
>>> spy.selectdata(cfg, data)

SpikeDatatrials, toi/toilim, unit, channel

Examples

>>> spy.selectdata(data, toilim=[-1, 2.5], unit=range(0, 10))
>>> cfg = spy.StructDict()
>>> cfg.toi = [1.25, 3.2]; cfg.trials = [0, 1, 2, 3]
>>> spy.selectdata(cfg, data)

Note Any property that is not specifically accessed via one of the provided selectors is taken as is, e.g., spy.selectdata(data, trials=[1, 2]) selects the entire contents of trials no. 2 and 3, while spy.selectdata(data, channel=range(0, 50)) selects the first 50 channels of data across all defined trials. Consequently, if no keywords are specified, the entire contents of data is selected.

Full documentation below

The parameters listed below can be provided as is or a via a cfg configuration ‘structure’, see Notes for details.

Parameters:

data (Syncopy data object) – A non-empty Syncopy data object. Note the type of data determines which keywords can be used. Some keywords are only valid for certain types of Syncopy objects, e.g., “freqs” is not a valid selector for an AnalogData object.
trials (list (integers) or None or "all") – List of integers representing trial numbers to be selected; can include repetitions and need not be sorted (e.g., trials = [0, 1, 0, 0, 2] is valid) but must be finite and not NaN. If trials is None, or trials = "all" all trials are selected.
channel (list (integers or strings), slice, range, str, int, None or "all") – Channel-selection; can be a list of channel names (['channel3', 'channel1']), a list of channel indices ([3, 5]), a slice (slice(3, 10)) or range (range(3, 10)). Note that following Python conventions, channels are counted starting at zero, and range and slice selections are half-open intervals of the form [low, high), i.e., low is included , high is excluded. Thus, channel = [0, 1, 2] or channel = slice(0, 3) selects the first up to (and including) the third channel. Selections can be unsorted and may include repetitions but must match exactly, be finite and not NaN. If channel is None, or channel = "all" all channels are selected.
latency ([begin, end], {'maxperiod', 'minperiod', 'prestim', 'poststim', 'all'} or None) – Either set desired time window ([begin, end]) in seconds, ‘maxperiod’ (default) for the maximum period available or ‘minperiod’ for minimal time-window all trials share, or `’prestim’ (all t < 0) or ‘poststim’ (all t > 0) If set this will apply a selection which is timelocked, meaning non-fitting (effectively too short) trials will be excluded
frequency (list (floats [fmin, fmax]) or None or "all") – Frequency-window [fmin, fmax] (in Hz) to be extracted. Window specifications must be sorted (e.g., [90, 70] is invalid) and not NaN but may be unbounded (e.g., [-np.inf, 60.5] is valid). Edges fmin and fmax are included in the selection. If foilim is None or foilim = "all", all frequencies are selected.
taper (list (integers or strings), slice, range, str, int, None or "all") – Taper-selection; can be a list of taper names (['dpss-win-1', 'dpss-win-3']), a list of taper indices ([3, 5]), a slice (slice(3, 10)) or range (range(3, 10)). Note that following Python conventions, tapers are counted starting at zero, and range and slice selections are half-open intervals of the form [low, high), i.e., low is included , high is excluded. Thus, taper = [0, 1, 2] or taper = slice(0, 3) selects the first up to (and including) the third taper. Selections can be unsorted and may include repetitions but must match exactly, be finite and not NaN. If taper is None or taper = "all", all tapers are selected.
unit (list (integers or strings), slice, range, str, int, None or "all") – Unit-selection; can be a list of unit names (['unit10', 'unit3']), a list of unit indices ([3, 5]), a slice (slice(3, 10)) or range (range(3, 10)). Note that following Python conventions, units are counted starting at zero, and range and slice selections are half-open intervals of the form [low, high), i.e., low is included , high is excluded. Thus, unit = [0, 1, 2] or unit = slice(0, 3) selects the first up to (and including) the third unit. Selections can be unsorted and may include repetitions but must match exactly, be finite and not NaN. If unit is None or unit = "all", all units are selected.
eventid (list (integers), slice, range, int, None or "all") – Event-ID-selection; can be a list of event-id codes ([2, 0, 1]), slice (slice(0, 2)) or range (range(0, 2)). Note that following Python conventions, range and slice selections are half-open intervals of the form [low, high), i.e., low is included , high is excluded. Selections can be unsorted and may include repetitions but must match exactly, be finite and not NaN. If eventid is None or eventid = "all", all events are selected.
inplace (bool) – If inplace is True no new object is created. Instead the provided selection is stored in the input object’s selection attribute for later use. By default inplace is False and all calls to selectdata create a new Syncopy data object.
clear (bool) – If True remove any active in-place selection. Note that in-place selections can also be removed manually by assinging None to the selection property, i.e., mydata.selection = None is equivalent to spy.selectdata(mydata, clear=True) or mydata.selectdata(clear=True)
parallel (None or bool) – If None (recommended), processing is automatically performed in parallel (i.e., concurrently across trials/channel-groups), provided a dask parallel processing client is running and available. Parallel processing can be manually disabled by setting parallel to False. If parallel is True but no parallel processing client is running, computing will be performed sequentially.

Returns:

dataselection – Syncopy data object of the same type as data but containing only the subset specified by provided selectors.

Return type:

Syncopy data object

Notes

This function can be either called providing its input arguments directly or via a cfg configuration ‘structure’. For instance, the following function calls are equivalent

>>> spy.selectdata(data, trials=...)
>>> cfg = spy.StructDict()
>>> cfg.trials = ...
>>> spy.selectdata(cfg, data)
>>> cfg.data = data
>>> spy.selectdata(cfg)

Please refer to Syncopy for FieldTrip Users for further details.

This routine represents a convenience function for creating new Syncopy objects based on existing data entities. However, in many situations, the creation of a new object (and thus the allocation of additional disk-space) might not be necessary: all Syncopy metafunctions, such as freqanalysis(), support in-place data selection.

Consider the following example: assume data is an AnalogData object representing 220 trials of LFP recordings containing baseline (between second -0.25 and 0) and stimulus-on data (on the interval [0.25, 0.5]). To compute the baseline spectrum, data-selection does not have to be performed before calling freqanalysis() but instead can be done in-place:

>>> import syncopy as spy
>>> cfg = spy.get_defaults(spy.freqanalysis)
>>> cfg.method = 'mtmfft'
>>> cfg.taper = 'dpss'
>>> cfg.output = 'pow'
>>> cfg.tapsmofrq = 10
>>> # define baseline/stimulus-on ranges
>>> baseSelect = {"toilim": [-0.25, 0]}
>>> stimSelect = {"toilim": [0.25, 0.5]}
>>> # in-place selection of baseline interval performed by `freqanalysis`
>>> cfg.select = baseSelect
>>> baselineSpectrum = spy.freqanalysis(cfg, data)
>>> # in-place selection of stimulus-on time-frame performed by `freqanalysis`
>>> cfg.select = stimSelect
>>> stimonSpectrum = spy.freqanalysis(cfg, data)

Especially for large data-sets, in-place data selection performed by Syncopy’s metafunctions does not only save disk-space but can significantly increase performance.

Examples

Use generate_artificial_data() to create a synthetic syncopy.AnalogData object.

>>> from syncopy.tests.misc import generate_artificial_data
>>> adata = generate_artificial_data(nTrials=10, nChannels=32)

Assume a hypothetical trial onset at second 2.0 with the first second of each trial representing baseline recordings. To extract only the stimulus-on period from adata, one could use

>>> stimon = spy.selectdata(adata, toilim=[2.0, np.inf])

Note that this is equivalent to

>>> stimon = adata.selectdata(toilim=[2.0, np.inf])

syncopy.datatype.base_data.BaseData

Contents

syncopy.datatype.base_data.BaseData#