Conversation
| num_pixels = imaging.get_num_pixels() | ||
| dtype_size_bytes = np.dtype(dtype).itemsize |
There was a problem hiding this comment.
use imaging.get_sample_size_in_bytes instead
alessandrofelder
left a comment
There was a problem hiding this comment.
Hey @alejoe91 and @arielleleon
Thanks again for this. @lauraporta and I had a fun pair-programming session today, playing with your PR (we will open a related PR trying to implement a naive suite2p motion correction Extractor-Segment pair - just to test our understanding). We are posting a summary of our thoughts and questions below, in advance of our next discussion
We were impressed by many things of the suggested architecture. In particular:
- very nice that we can re-use the roiextractors to load so many different data types...
- ... and that they are all dynamically constructed, minimising the amount of code we have to maintain :)
- we played with the interactive widgets and our own implementation of a processing step (suite2p motion correction) and it worked very nicely
We (think we) gained some in-depth understanding, which we would like to double-check:
- There are two central Base classes (Extractor and Segment). The key relationship between them is: An extractor has a Segment.
- Users, and libraries depending on
photon-mosaic-apiinstantiate Extractor object (via a function wrapping their constructor, for syntactic niceness) but never the Segment object (directly). - Both base classes have a
get_seriesfunction (often the extractor'sget_seriescalls the Segment'get_series) to access (parts of) the actual numerical array data - The Extractor classes are responsible for (de/)serialisation
- In some cases,
get_seriesperforms some computation lazily before returning the part of the array that was requested (like filtering)
Photon-mosaic specific details of Extractors-Segment architecture
- The base Extractor class for multiphoton videos is
BaseImaging - The base Segment class for multiphoton videos is
BaseImagingSegment - We take advantage of the the
roiextractorspackage, which already implements data loaders for many multiphoton video formats - We do this by dynamically creating
BaseImagingandBaseImagingSegmentclasses that wraproiextractors's data loading. - There is also an Extractor base class for ROIs (but ROIs don't have segments - they are static)
The architecture is designed to have some key advantages
- it keeps track of nested necessary computation and recursively computes them on-the-fly only on the actually requested slice of the array data, when it is requested.
In the current implementation, this architecture is implemented in preprocessing, but not (yet?) for segmentation. Our current segmentation is implemented as a Python function wrapping suite2p.
Main questions
These are closely related to the discussion points below, but may be helpful to have an overview
- Did we miss/misinterpret anything important above?
- Can we assume that
get_seriesalways or never returns a copy of the requested array data (lazily), or is this variable? - What are the conventions/assumptions around specifying things that go in the json?
- How do multiplane, multichannel videos relate to Segment objects?
Discussion points
RoiExtractors are implied dynamically, and therefore IDEs don't find them automatically - this can be disconcerting for users. Maybe solvable via documentation
We want to ensure that the latest computations on the data (happening in the get_series() on top of the stack) do not affect the parent data. We need more understanding around where a deepcopy of the data returned by get_series is needed, whether this is consistent across all Segment objects, or all Segment object that wrap a roiextractor, or what?
We note that in suite2p registration, often array data is overwritten. Fortunately, roietractor was reading the raw data as readonly and prevented unwanted actions.
It looks like the roiextractors return different types of array (e.g. we saw that ScanImageImagingExtractors return numpy.arrays while TiffImagingExtractor returns a (read-only) numpy.memmap). We are interested in the assumptions we can make as developers about the types return by get_series of our roiextractor-wrapping Segment objects. It would be great for this to become fully consistent.
Furthermore, it would be great if the returned array data was never loaded into memory by default (a dask array?)
Some multiphoton recordings will have multiple planes and channels, leading to datasets of different dimensions (from 3 to 5) returned by get_series. Other than the inconsistency here, this has other sort of consequences on the choice of the different algorithm to use. e.g. should we store planes and channels separately in different Segments? How do we envisage dealing with 5 dimensional datasets (tczyx)?
As you've noted too, we may want to refactor our nomenclature a bit. "Segments" is a bit confusing in the imaging world, and it's confusing that BaseExtractor has a Segment which wraps a RoiExtractor :D
What belongs to the json files for provenance? how will we take care of the different parameters used in the different steps? For instance, what is contained in the ops file for suite2p.
In the case of using this architecture with a workflow, each workflow step will call some Extractor.get_data() and Extractor.save() to apply the processing to all the volume and save json and cache intermediate raw data. Does this make sense?
Great!!! :)
In theory one extractor can have multiple segments. I think this is useful in cases where acquisition is performed in bacthes for example locked to some trial presentation.
Correct. Segments are mainly internal
Yes. The user will only interact with the
Correct.
Yes. Basically one can manupulate recording object (could be preprocessing, concatenating/slicing, selecting FOVs, etc). Then the resulting
It doesn't. Segments are "contiguous pieces of data". Currently, to get started, we assumed mono-channel/mono-plane imaging. However, we can use the Regarding multi-channel, I think we should restrict each
Yeah I think having everything in the docs (auto-compiled) should be ok.
It should not. See my point above about readers returning a copy (we have to make sure we do that!)
In this case my approach (for segmentation) was to save a copy of the input to binary, so that Suite2p can "mess with it".
I think any "array-like" would work, so either numpy arrays or memmaps are fine.
This should be the case for most readers, but not after preprocessing. I don't think we need dask at the moment, but we can discuss later.
See my comment above :)
True, but unfortunately that's what SpikeInterface uses! We could have a property that redefines them as "Epoch" and keep the segment nomenclature internal and private.
That is up to the preprocessors to log in the
To some extent yes. If a workflow step does some heavy preprocessing, then it's fine to save to binary. But this is not required by every step. In my mind, a workflow step will be "preprocessing", which could apply several steps and dump to binary. |
chore: intitial repo setup
Ported files and demo from https://github.com/AllenNeuralDynamics/photon-mosaic-demo
Added minimal README for installation