Skip to content

Populate files and demo#1

Open
alejoe91 wants to merge 2 commits intomainfrom
populate-repo
Open

Populate files and demo#1
alejoe91 wants to merge 2 commits intomainfrom
populate-repo

Conversation

@alejoe91
Copy link
Copy Markdown
Collaborator

Ported files and demo from https://github.com/AllenNeuralDynamics/photon-mosaic-demo

Added minimal README for installation

Comment on lines +78 to +79
num_pixels = imaging.get_num_pixels()
dtype_size_bytes = np.dtype(dtype).itemsize
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use imaging.get_sample_size_in_bytes instead

@lauraporta lauraporta self-requested a review November 27, 2025 11:18
Copy link
Copy Markdown
Collaborator

@alessandrofelder alessandrofelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alejoe91 and @arielleleon

Thanks again for this. @lauraporta and I had a fun pair-programming session today, playing with your PR (we will open a related PR trying to implement a naive suite2p motion correction Extractor-Segment pair - just to test our understanding). We are posting a summary of our thoughts and questions below, in advance of our next discussion

We were impressed by many things of the suggested architecture. In particular:

  • very nice that we can re-use the roiextractors to load so many different data types...
  • ... and that they are all dynamically constructed, minimising the amount of code we have to maintain :)
  • we played with the interactive widgets and our own implementation of a processing step (suite2p motion correction) and it worked very nicely

We (think we) gained some in-depth understanding, which we would like to double-check:

  • There are two central Base classes (Extractor and Segment). The key relationship between them is: An extractor has a Segment.
  • Users, and libraries depending on photon-mosaic-api instantiate Extractor object (via a function wrapping their constructor, for syntactic niceness) but never the Segment object (directly).
  • Both base classes have a get_series function (often the extractor's get_series calls the Segment' get_series) to access (parts of) the actual numerical array data
  • The Extractor classes are responsible for (de/)serialisation
  • In some cases, get_series performs some computation lazily before returning the part of the array that was requested (like filtering)

Photon-mosaic specific details of Extractors-Segment architecture

  • The base Extractor class for multiphoton videos is BaseImaging
  • The base Segment class for multiphoton videos is BaseImagingSegment
  • We take advantage of the the roiextractors package, which already implements data loaders for many multiphoton video formats
  • We do this by dynamically creating BaseImaging and BaseImagingSegment classes that wrap roiextractors's data loading.
  • There is also an Extractor base class for ROIs (but ROIs don't have segments - they are static)

The architecture is designed to have some key advantages

  • it keeps track of nested necessary computation and recursively computes them on-the-fly only on the actually requested slice of the array data, when it is requested.

In the current implementation, this architecture is implemented in preprocessing, but not (yet?) for segmentation. Our current segmentation is implemented as a Python function wrapping suite2p.

Main questions

These are closely related to the discussion points below, but may be helpful to have an overview

  1. Did we miss/misinterpret anything important above?
  2. Can we assume that get_series always or never returns a copy of the requested array data (lazily), or is this variable?
  3. What are the conventions/assumptions around specifying things that go in the json?
  4. How do multiplane, multichannel videos relate to Segment objects?

Discussion points

RoiExtractors are implied dynamically, and therefore IDEs don't find them automatically - this can be disconcerting for users. Maybe solvable via documentation

We want to ensure that the latest computations on the data (happening in the get_series() on top of the stack) do not affect the parent data. We need more understanding around where a deepcopy of the data returned by get_series is needed, whether this is consistent across all Segment objects, or all Segment object that wrap a roiextractor, or what?

We note that in suite2p registration, often array data is overwritten. Fortunately, roietractor was reading the raw data as readonly and prevented unwanted actions.

It looks like the roiextractors return different types of array (e.g. we saw that ScanImageImagingExtractors return numpy.arrays while TiffImagingExtractor returns a (read-only) numpy.memmap). We are interested in the assumptions we can make as developers about the types return by get_series of our roiextractor-wrapping Segment objects. It would be great for this to become fully consistent.

Furthermore, it would be great if the returned array data was never loaded into memory by default (a dask array?)

Some multiphoton recordings will have multiple planes and channels, leading to datasets of different dimensions (from 3 to 5) returned by get_series. Other than the inconsistency here, this has other sort of consequences on the choice of the different algorithm to use. e.g. should we store planes and channels separately in different Segments? How do we envisage dealing with 5 dimensional datasets (tczyx)?

As you've noted too, we may want to refactor our nomenclature a bit. "Segments" is a bit confusing in the imaging world, and it's confusing that BaseExtractor has a Segment which wraps a RoiExtractor :D

What belongs to the json files for provenance? how will we take care of the different parameters used in the different steps? For instance, what is contained in the ops file for suite2p.

In the case of using this architecture with a workflow, each workflow step will call some Extractor.get_data() and Extractor.save() to apply the processing to all the volume and save json and cache intermediate raw data. Does this make sense?

@alejoe91
Copy link
Copy Markdown
Collaborator Author

Hey @alejoe91 and @arielleleon

Thanks again for this. @lauraporta and I had a fun pair-programming session today, playing with your PR (we will open a related PR trying to implement a naive suite2p motion correction Extractor-Segment pair - just to test our understanding). We are posting a summary of our thoughts and questions below, in advance of our next discussion

We were impressed by many things of the suggested architecture. In particular:

  • very nice that we can re-use the roiextractors to load so many different data types...
  • ... and that they are all dynamically constructed, minimising the amount of code we have to maintain :)
  • we played with the interactive widgets and our own implementation of a processing step (suite2p motion correction) and it worked very nicely

Great!!! :)

We (think we) gained some in-depth understanding, which we would like to double-check:

  • There are two central Base classes (Extractor and Segment). The key relationship between them is: An extractor has a Segment.

In theory one extractor can have multiple segments. I think this is useful in cases where acquisition is performed in bacthes for example locked to some trial presentation.

  • Users, and libraries depending on photon-mosaic-api instantiate Extractor object (via a function wrapping their constructor, for syntactic niceness) but never the Segment object (directly).

Correct. Segments are mainly internal

  • Both base classes have a get_series function (often the extractor's get_series calls the Segment' get_series) to access (parts of) the actual numerical array data

Yes. The user will only interact with the BaseImaging.get_series()

  • The Extractor classes are responsible for (de/)serialisation
  • In some cases, get_series performs some computation lazily before returning the part of the array that was requested (like filtering)

Correct.

Photon-mosaic specific details of Extractors-Segment architecture

  • The base Extractor class for multiphoton videos is BaseImaging
  • The base Segment class for multiphoton videos is BaseImagingSegment
  • We take advantage of the the roiextractors package, which already implements data loaders for many multiphoton video formats
  • We do this by dynamically creating BaseImaging and BaseImagingSegment classes that wrap roiextractors's data loading.
  • There is also an Extractor base class for ROIs (but ROIs don't have segments - they are static)

The architecture is designed to have some key advantages

  • it keeps track of nested necessary computation and recursively computes them on-the-fly only on the actually requested slice of the array data, when it is requested.

In the current implementation, this architecture is implemented in preprocessing, but not (yet?) for segmentation. Our current segmentation is implemented as a Python function wrapping suite2p.

Yes. Basically one can manupulate recording object (could be preprocessing, concatenating/slicing, selecting FOVs, etc). Then the resulting BaseImaging object is sent to the run_segmentation (which could have several methods) and spits out a BaseRois object. We ould potentially implement chains of processing to the BaseRois as well, if needed.

Main questions

These are closely related to the discussion points below, but may be helpful to have an overview

  1. Did we miss/misinterpret anything important above?
    I think you interpreted everything correctly.
  1. Can we assume that get_series always or never returns a copy of the requested array data (lazily), or is this variable?
    Yes. This is akey point and should be enforced. The readers should be read only and return a copy of the video.
  1. What are the conventions/assumptions around specifying things that go in the json?
    These are based on the spikeinterface.BaseExtractor.to_dict() method: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/base.py#L411
  1. How do multiplane, multichannel videos relate to Segment objects?

It doesn't. Segments are "contiguous pieces of data". Currently, to get started, we assumed mono-channel/mono-plane imaging. However, we can use the BaseExtractor._main_ids to have multiplane object. In that case, the get_series function will allow one to retrieve one or multiple planes at ones (still to be implemented):

def get_series(start_frame, end_frame, plane_ids=[...])

Regarding multi-channel, I think we should restrict each BaseImaging to point to a single channel, even because this is what normally segmentation algorithms use. The user could then load multiple channels as different Imaging objects.

Discussion points

RoiExtractors are implied dynamically, and therefore IDEs don't find them automatically - this can be disconcerting for users. Maybe solvable via documentation

Yeah I think having everything in the docs (auto-compiled) should be ok.

We want to ensure that the latest computations on the data (happening in the get_series() on top of the stack) do not affect the parent data. We need more understanding around where a deepcopy of the data returned by get_series is needed, whether this is consistent across all Segment objects, or all Segment object that wrap a roiextractor, or what?

It should not. See my point above about readers returning a copy (we have to make sure we do that!)

We note that in suite2p registration, often array data is overwritten. Fortunately, roietractor was reading the raw data as readonly and prevented unwanted actions.

In this case my approach (for segmentation) was to save a copy of the input to binary, so that Suite2p can "mess with it".

It looks like the roiextractors return different types of array (e.g. we saw that ScanImageImagingExtractors return numpy.arrays while TiffImagingExtractor returns a (read-only) numpy.memmap). We are interested in the assumptions we can make as developers about the types return by get_series of our roiextractor-wrapping Segment objects. It would be great for this to become fully consistent.

I think any "array-like" would work, so either numpy arrays or memmaps are fine.

Furthermore, it would be great if the returned array data was never loaded into memory by default (a dask array?)

This should be the case for most readers, but not after preprocessing. I don't think we need dask at the moment, but we can discuss later.

Some multiphoton recordings will have multiple planes and channels, leading to datasets of different dimensions (from 3 to 5) returned by get_series. Other than the inconsistency here, this has other sort of consequences on the choice of the different algorithm to use. e.g. should we store planes and channels separately in different Segments? How do we envisage dealing with 5 dimensional datasets (tczyx)?

See my comment above :)

As you've noted too, we may want to refactor our nomenclature a bit. "Segments" is a bit confusing in the imaging world, and it's confusing that BaseExtractor has a Segment which wraps a RoiExtractor :D

True, but unfortunately that's what SpikeInterface uses! We could have a property that redefines them as "Epoch" and keep the segment nomenclature internal and private.

What belongs to the json files for provenance? how will we take care of the different parameters used in the different steps? For instance, what is contained in the ops file for suite2p.

That is up to the preprocessors to log in the self._kwargs. Whatever we add to the self._kwargs is saved/loaded.

In the case of using this architecture with a workflow, each workflow step will call some Extractor.get_data() and Extractor.save() to apply the processing to all the volume and save json and cache intermediate raw data. Does this make sense?

To some extent yes. If a workflow step does some heavy preprocessing, then it's fine to save to binary. But this is not required by every step. In my mind, a workflow step will be "preprocessing", which could apply several steps and dump to binary.

seanmcculloch added a commit that referenced this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants