Populate files and demo by alejoe91 · Pull Request #1 · photon-mosaic/photon-mosaic

alejoe91 · 2025-11-27T11:03:21Z

Ported files and demo from https://github.com/AllenNeuralDynamics/photon-mosaic-demo

Added minimal README for installation

alejoe91 · 2025-11-27T11:05:46Z

src/photon_mosaic/core/imaging_tools.py

+    num_pixels = imaging.get_num_pixels()
+    dtype_size_bytes = np.dtype(dtype).itemsize


use imaging.get_sample_size_in_bytes instead

alessandrofelder

Hey @alejoe91 and @arielleleon

Thanks again for this. @lauraporta and I had a fun pair-programming session today, playing with your PR (we will open a related PR trying to implement a naive suite2p motion correction Extractor-Segment pair - just to test our understanding). We are posting a summary of our thoughts and questions below, in advance of our next discussion

We were impressed by many things of the suggested architecture. In particular:

very nice that we can re-use the roiextractors to load so many different data types...
... and that they are all dynamically constructed, minimising the amount of code we have to maintain :)
we played with the interactive widgets and our own implementation of a processing step (suite2p motion correction) and it worked very nicely

We (think we) gained some in-depth understanding, which we would like to double-check:

There are two central Base classes (Extractor and Segment). The key relationship between them is: An extractor has a Segment.
Users, and libraries depending on photon-mosaic-api instantiate Extractor object (via a function wrapping their constructor, for syntactic niceness) but never the Segment object (directly).
Both base classes have a get_series function (often the extractor's get_series calls the Segment' get_series) to access (parts of) the actual numerical array data
The Extractor classes are responsible for (de/)serialisation
In some cases, get_series performs some computation lazily before returning the part of the array that was requested (like filtering)

Photon-mosaic specific details of Extractors-Segment architecture

The base Extractor class for multiphoton videos is BaseImaging
The base Segment class for multiphoton videos is BaseImagingSegment
We take advantage of the the roiextractors package, which already implements data loaders for many multiphoton video formats
We do this by dynamically creating BaseImaging and BaseImagingSegment classes that wrap roiextractors's data loading.
There is also an Extractor base class for ROIs (but ROIs don't have segments - they are static)

The architecture is designed to have some key advantages

it keeps track of nested necessary computation and recursively computes them on-the-fly only on the actually requested slice of the array data, when it is requested.

In the current implementation, this architecture is implemented in preprocessing, but not (yet?) for segmentation. Our current segmentation is implemented as a Python function wrapping suite2p.

Main questions

These are closely related to the discussion points below, but may be helpful to have an overview

Did we miss/misinterpret anything important above?
Can we assume that get_series always or never returns a copy of the requested array data (lazily), or is this variable?
What are the conventions/assumptions around specifying things that go in the json?
How do multiplane, multichannel videos relate to Segment objects?

Discussion points

RoiExtractors are implied dynamically, and therefore IDEs don't find them automatically - this can be disconcerting for users. Maybe solvable via documentation

We want to ensure that the latest computations on the data (happening in the get_series() on top of the stack) do not affect the parent data. We need more understanding around where a deepcopy of the data returned by get_series is needed, whether this is consistent across all Segment objects, or all Segment object that wrap a roiextractor, or what?

We note that in suite2p registration, often array data is overwritten. Fortunately, roietractor was reading the raw data as readonly and prevented unwanted actions.

It looks like the roiextractors return different types of array (e.g. we saw that ScanImageImagingExtractors return numpy.arrays while TiffImagingExtractor returns a (read-only) numpy.memmap). We are interested in the assumptions we can make as developers about the types return by get_series of our roiextractor-wrapping Segment objects. It would be great for this to become fully consistent.

Furthermore, it would be great if the returned array data was never loaded into memory by default (a dask array?)

Some multiphoton recordings will have multiple planes and channels, leading to datasets of different dimensions (from 3 to 5) returned by get_series. Other than the inconsistency here, this has other sort of consequences on the choice of the different algorithm to use. e.g. should we store planes and channels separately in different Segments? How do we envisage dealing with 5 dimensional datasets (tczyx)?

As you've noted too, we may want to refactor our nomenclature a bit. "Segments" is a bit confusing in the imaging world, and it's confusing that BaseExtractor has a Segment which wraps a RoiExtractor :D

What belongs to the json files for provenance? how will we take care of the different parameters used in the different steps? For instance, what is contained in the ops file for suite2p.

In the case of using this architecture with a workflow, each workflow step will call some Extractor.get_data() and Extractor.save() to apply the processing to all the volume and save json and cache intermediate raw data. Does this make sense?

alejoe91 · 2025-12-18T11:45:29Z

Hey @alejoe91 and @arielleleon

Thanks again for this. @lauraporta and I had a fun pair-programming session today, playing with your PR (we will open a related PR trying to implement a naive suite2p motion correction Extractor-Segment pair - just to test our understanding). We are posting a summary of our thoughts and questions below, in advance of our next discussion

We were impressed by many things of the suggested architecture. In particular:

very nice that we can re-use the roiextractors to load so many different data types...

... and that they are all dynamically constructed, minimising the amount of code we have to maintain :)

we played with the interactive widgets and our own implementation of a processing step (suite2p motion correction) and it worked very nicely

Great!!! :)

We (think we) gained some in-depth understanding, which we would like to double-check:

There are two central Base classes (Extractor and Segment). The key relationship between them is: An extractor has a Segment.

In theory one extractor can have multiple segments. I think this is useful in cases where acquisition is performed in bacthes for example locked to some trial presentation.

Users, and libraries depending on photon-mosaic-api instantiate Extractor object (via a function wrapping their constructor, for syntactic niceness) but never the Segment object (directly).

Correct. Segments are mainly internal

Both base classes have a get_series function (often the extractor's get_series calls the Segment' get_series) to access (parts of) the actual numerical array data

Yes. The user will only interact with the BaseImaging.get_series()

The Extractor classes are responsible for (de/)serialisation

In some cases, get_series performs some computation lazily before returning the part of the array that was requested (like filtering)

Correct.

Photon-mosaic specific details of Extractors-Segment architecture

The base Extractor class for multiphoton videos is BaseImaging

The base Segment class for multiphoton videos is BaseImagingSegment

We take advantage of the the roiextractors package, which already implements data loaders for many multiphoton video formats

We do this by dynamically creating BaseImaging and BaseImagingSegment classes that wrap roiextractors's data loading.

There is also an Extractor base class for ROIs (but ROIs don't have segments - they are static)

The architecture is designed to have some key advantages

it keeps track of nested necessary computation and recursively computes them on-the-fly only on the actually requested slice of the array data, when it is requested.

In the current implementation, this architecture is implemented in preprocessing, but not (yet?) for segmentation. Our current segmentation is implemented as a Python function wrapping suite2p.

Yes. Basically one can manupulate recording object (could be preprocessing, concatenating/slicing, selecting FOVs, etc). Then the resulting BaseImaging object is sent to the run_segmentation (which could have several methods) and spits out a BaseRois object. We ould potentially implement chains of processing to the BaseRois as well, if needed.

Main questions

These are closely related to the discussion points below, but may be helpful to have an overview

Did we miss/misinterpret anything important above?
I think you interpreted everything correctly.

Can we assume that get_series always or never returns a copy of the requested array data (lazily), or is this variable?
Yes. This is akey point and should be enforced. The readers should be read only and return a copy of the video.

What are the conventions/assumptions around specifying things that go in the json?
These are based on the spikeinterface.BaseExtractor.to_dict() method: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/base.py#L411

How do multiplane, multichannel videos relate to Segment objects?

It doesn't. Segments are "contiguous pieces of data". Currently, to get started, we assumed mono-channel/mono-plane imaging. However, we can use the BaseExtractor._main_ids to have multiplane object. In that case, the get_series function will allow one to retrieve one or multiple planes at ones (still to be implemented):

def get_series(start_frame, end_frame, plane_ids=[...])

Regarding multi-channel, I think we should restrict each BaseImaging to point to a single channel, even because this is what normally segmentation algorithms use. The user could then load multiple channels as different Imaging objects.

Discussion points

RoiExtractors are implied dynamically, and therefore IDEs don't find them automatically - this can be disconcerting for users. Maybe solvable via documentation

Yeah I think having everything in the docs (auto-compiled) should be ok.

We want to ensure that the latest computations on the data (happening in the get_series() on top of the stack) do not affect the parent data. We need more understanding around where a deepcopy of the data returned by get_series is needed, whether this is consistent across all Segment objects, or all Segment object that wrap a roiextractor, or what?

It should not. See my point above about readers returning a copy (we have to make sure we do that!)

We note that in suite2p registration, often array data is overwritten. Fortunately, roietractor was reading the raw data as readonly and prevented unwanted actions.

In this case my approach (for segmentation) was to save a copy of the input to binary, so that Suite2p can "mess with it".

It looks like the roiextractors return different types of array (e.g. we saw that ScanImageImagingExtractors return numpy.arrays while TiffImagingExtractor returns a (read-only) numpy.memmap). We are interested in the assumptions we can make as developers about the types return by get_series of our roiextractor-wrapping Segment objects. It would be great for this to become fully consistent.

I think any "array-like" would work, so either numpy arrays or memmaps are fine.

Furthermore, it would be great if the returned array data was never loaded into memory by default (a dask array?)

This should be the case for most readers, but not after preprocessing. I don't think we need dask at the moment, but we can discuss later.

Some multiphoton recordings will have multiple planes and channels, leading to datasets of different dimensions (from 3 to 5) returned by get_series. Other than the inconsistency here, this has other sort of consequences on the choice of the different algorithm to use. e.g. should we store planes and channels separately in different Segments? How do we envisage dealing with 5 dimensional datasets (tczyx)?

See my comment above :)

As you've noted too, we may want to refactor our nomenclature a bit. "Segments" is a bit confusing in the imaging world, and it's confusing that BaseExtractor has a Segment which wraps a RoiExtractor :D

True, but unfortunately that's what SpikeInterface uses! We could have a property that redefines them as "Epoch" and keep the segment nomenclature internal and private.

What belongs to the json files for provenance? how will we take care of the different parameters used in the different steps? For instance, what is contained in the ops file for suite2p.

That is up to the preprocessors to log in the self._kwargs. Whatever we add to the self._kwargs is saved/loaded.

In the case of using this architecture with a workflow, each workflow step will call some Extractor.get_data() and Extractor.save() to apply the processing to all the volume and save json and cache intermediate raw data. Does this make sense?

To some extent yes. If a workflow step does some heavy preprocessing, then it's fine to save to binary. But this is not required by every step. In my mind, a workflow step will be "preprocessing", which could apply several steps and dump to binary.

chore: intitial repo setup

Populate files and demo

e1466ad

alejoe91 commented Nov 27, 2025

View reviewed changes

lauraporta self-requested a review November 27, 2025 11:18

fix: simplify code-base following SI refactor

1728c53

alejoe91 mentioned this pull request Nov 28, 2025

Implement ChunkExecutor SpikeInterface/spikeinterface#4216

Closed

alessandrofelder reviewed Dec 17, 2025

View reviewed changes

lauraporta mentioned this pull request Dec 17, 2025

Play with Extractor / Segment architecture #2

Closed

alessandrofelder mentioned this pull request Dec 22, 2025

Wrap roiextractor readers #6

Open

seanmcculloch added a commit that referenced this pull request Mar 13, 2026

Merge pull request #1 from AllenNeuralDynamics/main

2bf16ad

chore: intitial repo setup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Populate files and demo#1

Populate files and demo#1
alejoe91 wants to merge 2 commits intomainfrom
populate-repo

alejoe91 commented Nov 27, 2025

Uh oh!

alejoe91 Nov 27, 2025

Uh oh!

alessandrofelder left a comment

Uh oh!

alejoe91 commented Dec 18, 2025

Photon-mosaic specific details of Extractors-Segment architecture

Main questions

Discussion points

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		num_pixels = imaging.get_num_pixels()
		dtype_size_bytes = np.dtype(dtype).itemsize

Conversation

alejoe91 commented Nov 27, 2025

Uh oh!

alejoe91 Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

alessandrofelder left a comment

Choose a reason for hiding this comment

Photon-mosaic specific details of Extractors-Segment architecture

Main questions

Discussion points

Uh oh!

alejoe91 commented Dec 18, 2025

Photon-mosaic specific details of Extractors-Segment architecture

Main questions

Discussion points

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants