.. _selections: *********** Selections *********** Basically every practical data analysis project involves working on subsets of the data. Syncopy offers the powerful :func:`~syncopy.selectdata` function to achieve exactly that. There are two distinct ways to apply a selection, either *in-place* or the chosen subset gets copied and returned as a new Syncopy data object. Additionally, every Syncopy *meta-function* (see :ref:`meta_functions`) supports the ``select`` keyword, which applies an in place selection on the fly when processing data. Selections are an important concept of Syncopy and are being re-used *under the hood* for plotting functions like :func:`~syncopy.singlepanelplot` and also for :func:`~syncopy.show`. Both plotting and numpy array extraction naturally operate on subsets of the data, and conveniently use the same selection criteria syntax as :func:`~syncopy.selectdata`. .. contents:: Topics covered :local: .. _workflow: Creating Selections =================== Syncopy data objects can be best understood as n-dimensional arrays or matrices, with each dimension holding a certain property of the data. For an :class:`~syncopy.AnalogData` object these dimensions would be ``time``, ``channel`` and ``trials``. Now we can define sub-slices of the data by combining index sets for each of those axes. A new selection can be created by calling :func:`~syncopy.selectdata`, here we want to select a single trial and two channels:: trial10 = AData.selectdata(trials=10, channel=["channel11", "channel17"]) # alternatively trial10 = spy.selectdata(AData, trials=10, channel=["channel02", "channel06"]) ``AData`` is an :class:`~syncopy.AnalogData` object, and hence the resulting ``trial10`` data object will also be of that same data type. Inspecting the original dataset by simply typing its name into the Python interpreter:: AData we see that we have 100 trials and 10 channels: .. code-block:: bash Syncopy AnalogData object with fields cfg : dictionary with keys '' channel : [10] element container : None data : 100 trials of length 500.0 defined on [50000 x 10] float32 Dataset of size 1.91 MB ... If we now inspect our selection results:: trial10 we see that we are left with 1 trial and 2 channels: .. code-block:: bash Syncopy AnalogData object with fields cfg : dictionary with keys '' channel : [2] element container : None data : 1 trials of length 500.0 defined on [500 x 2] float32 Dataset of size 0.00 MB ... As we did not specify any selection criteria for the time axis (via ``latency``) every sample was selected. This is true in general: whenever a certain dimension has no selection specification the complete axis is selected. Finally by inspecting the ``.log`` (see also :ref:`logging`) we can see the selection settings used to create this dataset:: trials10.log .. code-block:: bash write_log: computed _selectdata with settings inplace = False clear = False latency=None trials = 10 channel = ['channel02', 'channel06'] This log is persistent, meaning that when saving and later loading this reduced dataset the settings used for this selection can still be recovered. The table below summarizes all possible selection parameters and their availability for each datatype. .. _selections_table: Table of Selection Parameters ============================= There are various selection parameters available, which each can accept a variety of Python datatypes like ``int``, ``str`` or ``list``. Some selection parameters are only available for data types which have the corresponding dimension, like ``frequency`` for ``SpectralData`` and ``CrossSpectralData``. +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | **Parameter** | **Description** | **Accepted Types** | **Examples** | **Availability** | +===================+=======================+=======================+=======================+=====================================+ | trials | |trialsDesc| | |trialsVals| | |trialsEx1| | | | | | | | | | | | | |trialsEx2| | *all data types* | | | | | | | | | | | |trialsEx3| | | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | channel | |channelDesc| | |channelVals| | |channelEx1| | :class:`~syncopy.AnalogData` | | | | | | | | | | | |channelEx2| | :class:`~syncopy.SpectralData` | | | | | | | | | | | |channelEx3| | :class:`~syncopy.CrossSpectralData` | | | | | | | | | | | |channelEx4| | :class:`~syncopy.SpikeData` | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | latency | |latDesc| | |latVals| | |latEx1| | :class:`~syncopy.AnalogData` | | | | | | | | | | | |latEx2| | :class:`~syncopy.SpectralData` | | | | | | | | | | | |latEx3| | :class:`~syncopy.CrossSpectralData` | | | | | | | | | | | | :class:`~syncopy.SpikeData` | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | frequency | |freqDesc| | |freqVals| | |freqEx1| | | | | | | | | | | | | |freqEx2| | :class:`~syncopy.SpectralData` | | | | | | | | | | | | :class:`~syncopy.CrossSpectralData` | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | unit | |unitDesc| | |unitVals| | |unitEx1| | :class:`~syncopy.SpikeData` | | | | | | | | | | | |unitEx2| | | | | | | | | | | | | |unitEx3| | | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ | eventid | |eventidDesc| | |eventidVals| | |eventidEx1| | | | | | | | :class:`~syncopy.EventData` | | | | | |eventidEx2| | | +-------------------+-----------------------+-----------------------+-----------------------+-------------------------------------+ .. |trialsVals| replace:: *int, array, list* .. |trialsDesc| replace:: *trial selection* .. |trialsEx1| replace:: ``selectdata(trials=7)`` .. |trialsEx2| replace:: ``selectdata(trials=[2, 9, 21])`` .. |trialsEx3| replace:: ``selectdata(trials=np.arange(2, 10))`` .. |channelDesc| replace:: *channel selection* .. |channelVals| replace:: *int, str, list, array* .. |channelEx1| replace:: ``selectdata(channel=7)`` .. |channelEx2| replace:: ``selectdata(channel=[11, 16])`` .. |channelEx3| replace:: ``selectdata(channel=np.arange(2, 10))`` .. |channelEx4| replace:: ``selectdata(channel=["V1-11, "V2-12"])`` .. |latDesc| replace:: *time interval of interest in seconds* .. |latVals| replace:: *list, float, 'maxperiod', 'minperiod', 'prestim', 'poststim'* .. |latEx1| replace:: ``selectdata(latency=[0.2, 1.])`` .. |latEx3| replace:: ``selectdata(latency='minperiod')`` .. |latEx2| replace:: ``selectdata(latency=0.5)`` .. |freqDesc| replace:: *frequencies of interest in Hz* .. |freqVals| replace:: *float, list* .. |freqEx1| replace:: ``selectdata(frequency=20.5)`` .. |freqEx2| replace:: ``selectdata(frequency=[5, 10, 15])`` .. |unitDesc| replace:: *unit selection* .. |unitVals| replace:: *int, str, list, array* .. |unitEx1| replace:: ``selectdata(unit=7)`` .. |unitEx2| replace:: ``selectdata(unit=[11, 16, 32])`` .. |unitEx3| replace:: ``selectdata(unit=["unit17", "unit3"])`` .. |eventidDesc| replace:: *eventid selection* .. |eventidVals| replace:: *int, list, array* .. |eventidEx1| replace:: ``selectdata(eventid=2)`` .. |eventidEx2| replace:: ``selectdata(eventid=[2, 0, 1])`` .. note:: Have a look at :doc:`Data Basics ` for further details about Syncopy's data classes and interfaces Inplace Selections ================== An in-place selection can be understood as a mask being put onto the data. Meaning that the selected subset of the data is actually **not copied on disc**, but the selection criteria are applied *in place* to be used in a processing step. Inplace selections take two forms: either explicit via the ``inplace`` keyword ``selectdata(..., inplace=True)``, or implicit by passing a ``select`` keyword to a Syncopy meta-function. To illustrate this mechanic, let's create a simulated dataset with :func:`~syncopy.tests.synth_data.phase_diffusion` and compute the coherence for the full dataset: .. literalinclude:: /scripts/select_example.py .. image:: /_static/select_example1.png :height: 220px Phase diffusing signals decorrelate over time, hence if we wait long enough we can't see any coherence. .. note:: As an exercise you could use :func:`~syncopy.freqanalysis` to confirm that there is indeed strong oscillatory activity in the 40Hz band Explicit inplace Selection -------------------------- To see if maybe for a shorter time period in the beginning of "the recording" the signals were actually more phase locked, we can use an **in-place latency selection**:: # note there is no return value here spy.selectdata(adata, latency=[-1, 0], inplace=True) Inspecting the dataset: .. code-block:: bash Syncopy AnalogData object with fields cfg : dictionary with keys 'selectdata' channel : [2] element container : None data : 100 trials defined on [50000 x 3] float64 Dataset of size 1.14 MB dimord : time by channel filename : /home/whir/.spy/spy_3e83_a9c8b544.analog info : dictionary with keys '' mode : r+ sampleinfo : [100 x 2] element samplerate : 200.0 selection : Syncopy AnalogData selector with all channels, 201 times, 100 trials tag : None time : 100 element list trialinfo : [100 x 0] element trialintervals : [100 x 2] element trials : 100 element iterable we can see that now the ``selection`` entry is filled with information, telling us we selected 201 time points. With that selection being active, let's repeat the connectivity analysis:: # coherence with active in-place selection coh2 = spy.connectivityanalysis(adata, method='coh') # plot coherence of channel1 vs channel2 coh2.singlepanelplot(channel_i='channel1', channel_j='channel2') .. image:: /_static/select_example2.png :height: 220px Indeed, we now see some coherence around the 40Hz band. Finally, let's **wipe the inplace selection** before continuing:: # inspect active inplace selection adata.selection >>> Syncopy AnalogData selector with all channels, 201 times, 100 trials # wipe selection adata.selection = None Inplace selection via ``select`` keyword ---------------------------------------- Alternatively, we can also give a dictionary of selection parameters directly to every Syncopy meta-function. These will then internally apply an inplace selection before performing the analysis:: # coherence only for selected time points coh3 = spy.connectivityanalysis(adata, method='coh', select={'latency': [-1, 0]}) # plot coherence of channel1 vs channel2 coh3.singlepanelplot(channel_i='channel1', channel_j='channel2') .. image:: /_static/select_example2.png :height: 220px Hopefully not surprisingly we get to exactly the same result as with an explicit in-place selection above. The difference here however is, that after the analysis is done, there is no active in-place selection present:: adata.selection is None >>> True Hence, it's important to note that implicit selections **get wiped automatically** after an analysis. In the end it is up to the user to decide which way of applying selections is most practical in their situation. Relation to :func:`~syncopy.show` and :func:`~syncopy.singlepanelplot` ====================================================================== As hinted on in the beginning of this chapter, both plotting and numpy array extraction adhere to the same syntax as :func:`~syncopy.selectdata`. Meaning that the following two arrays hold the same data:: # First explicitly select a subset trial10 = spy.selectdata(AData, trials=10, channel=["channel02", "channel06"]) # show everything of the subset # WARNING: don't do this with large datasets! arr1 = trial10.show() # calling show() with the same selection # criteria directly on the original complete dataset arr2 = AData.show(trials=10, channel=["channel02", "channel06"]) # this is True! arr1 == arr2 And in the same spirit, both plotting commands below will produce the same figure:: # First explicitly select a subset trial10 = spy.selectdata(AData, trials=10, channel=["channel02", "channel06"]) # plot everything: only 1 trial and 2 channels left trial10.singlepanelplot() # directly plot from full data set with same selection criteria AData.singlepanelplot(trials=10, channel=["channel02", "channel06"]) This works *under the hood* by applying temporary in-place selections onto the data before plotting and/or extracting the numpy arrays.