Skip to content

Conversation

@marcorudolphflex
Copy link
Contributor

@marcorudolphflex marcorudolphflex commented Nov 5, 2025

Greptile Overview

Updated On: 2025-11-05 16:45:19 UTC

Greptile Summary

This PR replaces the autograd scipy convolution implementation with a custom FFT-based convolution to improve tracing performance. The main changes include:

  • Removed dependency on autograd.scipy.signal.convolve and replaced with custom _fft_convolve_general function using FFT convolution
  • Added _normalize_axes helper function to handle axis specification and validation
  • Added comprehensive tests for convolution with explicit axes specification
  • Updated CHANGELOG to document the performance improvement

Issues found:

  • Dead code in _normalize_axes (lines 74-82): bounds checking occurs after modulo normalization, so checks will never trigger
  • Extra blank line in CHANGELOG.md

Confidence Score: 3/5

  • This PR is mostly safe to merge but contains dead code that should be addressed
  • The FFT-based convolution implementation appears algorithmically sound and includes appropriate tests. However, the dead code in bounds checking (lines 74-82) indicates incomplete validation logic that could mask invalid inputs. While this won't cause runtime errors due to the modulo operation preventing out-of-bounds access, it means the function won't properly reject invalid axis specifications, which could lead to confusing behavior.
  • Pay close attention to tidy3d/plugins/autograd/functions.py - the bounds checking logic in _normalize_axes should be fixed or removed

Important Files Changed

File Analysis

Filename Score Overview
tidy3d/plugins/autograd/functions.py 3/5 replaced autograd.scipy.signal.convolve with custom FFT-based implementation for performance; added helper functions _normalize_axes and _fft_convolve_general; contains dead code in bounds checking (lines 74-82)
tests/test_plugins/autograd/test_functions.py 5/5 added test class TestConvolveAxes with parametrized tests for convolution with explicit axes, including validation against numpy and gradient checks
CHANGELOG.md 4/5 added entry for improved autograd convolution speed; contains extra blank line on line 57

Sequence Diagram

sequenceDiagram
    participant User
    participant convolve
    participant _normalize_axes
    participant pad
    participant _fft_convolve_general
    participant fftn/ifftn

    User->>convolve: array, kernel, axes, mode, padding
    convolve->>convolve: validate kernel dimensions (must be odd)
    convolve->>_normalize_axes: array.ndim, kernel.ndim, axes
    _normalize_axes->>_normalize_axes: normalize axes with modulo
    _normalize_axes->>_normalize_axes: validate axes (length match, uniqueness)
    _normalize_axes-->>convolve: axes_array, axes_kernel
    
    alt mode is "same" or "full"
        loop for each axis pair
            convolve->>pad: working_array, pad_width, padding mode
            pad-->>convolve: padded array
        end
        convolve->>convolve: set effective_mode ("valid" for "same", "full" for "full")
    end
    
    convolve->>_fft_convolve_general: working_array, kernel, axes, effective_mode
    _fft_convolve_general->>_fft_convolve_general: reorder axes (batch vs conv)
    _fft_convolve_general->>_fft_convolve_general: expand dimensions for broadcasting
    _fft_convolve_general->>fftn/ifftn: compute FFT convolution
    fftn/ifftn-->>_fft_convolve_general: full convolution result
    
    alt effective_mode is "valid"
        _fft_convolve_general->>_fft_convolve_general: slice valid region
    end
    
    _fft_convolve_general->>_fft_convolve_general: convert to real if inputs were real
    _fft_convolve_general-->>convolve: result
    convolve-->>User: convolved array
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@marcorudolphflex marcorudolphflex force-pushed the FXC-3961-faster-convolutions-for-tidy-3-d-plugins-autograd-filters branch from 4df0cd5 to 1255f7f Compare November 5, 2025 16:55
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/plugins/autograd/functions.py (87.6%): Missing lines 51-54,57,71,79,85,126,159,171

Summary

  • Total: 89 lines
  • Missing: 11 lines
  • Coverage: 87%

tidy3d/plugins/autograd/functions.py

Lines 47-61

  47     """Normalize the axes specification for convolution."""
  48 
  49     def _normalize_single_axis(ax: SupportsInt, ndim: int, kind: str) -> int:
  50         if not isinstance(ax, int):
! 51             try:
! 52                 ax = int(ax)
! 53             except Exception as e:
! 54                 raise TypeError(f"Axis {ax!r} could not be converted to an integer.") from e
  55 
  56         if not -ndim <= ax < ndim:
! 57             raise ValueError(f"Invalid axis {ax} for {kind} with ndim {ndim}.")
  58         return ax + ndim if ax < 0 else ax
  59 
  60     if axes is None:
  61         if ndim_array != ndim_kernel:

Lines 67-75

  67         axes_kernel = tuple(range(ndim_kernel))
  68         return axes_array, axes_kernel
  69 
  70     if len(axes) != 2:
! 71         raise ValueError("'axes' must be a tuple of two iterable collections of axis indices.")
  72 
  73     axes_array_raw, axes_kernel_raw = axes
  74 
  75     axes_array = tuple(_normalize_single_axis(ax, ndim_array, "array") for ax in axes_array_raw)

Lines 75-83

  75     axes_array = tuple(_normalize_single_axis(ax, ndim_array, "array") for ax in axes_array_raw)
  76     axes_kernel = tuple(_normalize_single_axis(ax, ndim_kernel, "kernel") for ax in axes_kernel_raw)
  77 
  78     if len(axes_array) != len(axes_kernel):
! 79         raise ValueError(
  80             "The number of convolution axes for the array and kernel must be the same, "
  81             f"got {len(axes_array)} and {len(axes_kernel)}."
  82         )

Lines 81-89

  81             f"got {len(axes_array)} and {len(axes_kernel)}."
  82         )
  83 
  84     if len(set(axes_array)) != len(axes_array) or len(set(axes_kernel)) != len(axes_kernel):
! 85         raise ValueError("Convolution axes must be unique for both the array and the kernel.")
  86 
  87     return axes_array, axes_kernel
  88 

Lines 122-130

  122     array_conv_shape = array_reordered.shape[num_batch_array:]
  123     kernel_conv_shape = kernel_reordered.shape[num_batch_kernel:]
  124 
  125     if any(d <= 0 for d in array_conv_shape + kernel_conv_shape):
! 126         raise ValueError("Convolution dimensions must be positive; got zero-length axis.")
  127 
  128     fft_axes = tuple(range(-num_conv_axes, 0))
  129     fft_shape = [next_fast_len(n + k - 1) for n, k in zip(array_conv_shape, kernel_conv_shape)]
  130     use_real_fft = fft_shape[-1] % 2 == 0  # only applicable in this case

Lines 155-163

  155     ifft_fun = irfftn if use_real_fft else ifftn
  156     full_result = ifft_fun(product, fft_shape, axes=fft_axes)
  157 
  158     if mode == "full":
! 159         result = full_result
  160     elif mode == "valid":
  161         valid_slices = [slice(None)] * full_result.ndim
  162         for axis_offset, (array_dim, kernel_dim) in enumerate(
  163             zip(array_conv_shape, kernel_conv_shape)

Lines 167-175

  167             axis = full_result.ndim - num_conv_axes + axis_offset
  168             valid_slices[axis] = slice(start, start + length)
  169         result = full_result[tuple(valid_slices)]
  170     else:
! 171         raise ValueError(f"Unsupported convolution mode '{mode}'.")
  172 
  173     return np.real(result)
  174 

@marcorudolphflex marcorudolphflex force-pushed the FXC-3961-faster-convolutions-for-tidy-3-d-plugins-autograd-filters branch 7 times, most recently from 799eaf0 to 24ea2ac Compare November 6, 2025 14:11
Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marcorudolphflex this is great! I guess as a general observation we should keep in mind that this new implementation is slower for small arrays/kernels and faster for larger ones. Since for small convolutions both methods are fast in absolute terms and for large cases (where the cost of the convolution becomes noticeable) FFT will be significantly faster I think this is definitely a better default. It could be interesting to check with a little bit of benchmarking where this crossover lies and compare that to typical sizes in topopt problems where the arrays are generally on the order of a few hundred pixels in x and y and kernels are probably on the order of ~10 pixels.

"""Normalize the axes specification for convolution."""

def _normalize_single_axis(ax: int, ndim: int, kind: str) -> int:
if not isinstance(ax, int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to be this strict with the int type here? This would break if you pass something like axes=np.arange(3) for example because those are np.int64. I think this would be a pretty common case.

Comment on lines 174 to 274
class TestConvolveAxes:
@pytest.mark.parametrize("mode", ["valid", "same", "full"])
@pytest.mark.parametrize("padding", ["constant", "edge"])
def test_convolve_axes_val(self, rng, mode, padding):
"""Test convolution with explicit axes against NumPy implementations."""
array = rng.random((2, 5))
kernel = rng.random((3, 3))
axes = ([1], [1])

conv_td = convolve(array, kernel, padding=padding, mode=mode, axes=axes)

working_array = array
scipy_mode = mode
if mode in ("same", "full"):
pad_width = kernel.shape[1] // 2 if mode == "same" else kernel.shape[1] - 1
working_array = pad(array, (pad_width, pad_width), mode=padding, axis=1)
scipy_mode = "valid"

working_array_np = np.asarray(working_array)
kernel_np = np.asarray(kernel)
conv_length = np.convolve(working_array_np[0], kernel_np[0], mode=scipy_mode).shape[0]

expected = np.empty((array.shape[0], kernel.shape[0], conv_length))
for i in range(array.shape[0]):
for j in range(kernel.shape[0]):
expected[i, j] = np.convolve(
working_array_np[i],
kernel_np[j],
mode=scipy_mode,
)

npt.assert_allclose(conv_td, expected, atol=1e-12)

def test_convolve_axes_grad(self, rng):
"""Test gradients of convolution when specific axes are provided."""
array = rng.random((2, 5))
kernel = rng.random((3, 3))
check_grads(convolve, modes=["rev"], order=2)(
array,
kernel,
padding="constant",
mode="valid",
axes=([1], [1]),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure I understand this test. Wouldn't we want to parametrize over axes? That's what I would expect from the naming of the test at least. And we'd want to parametrize over array and kernel too? Why are these things parametrized in the value but not in the gradient test?

More generally, isn't this test replicating much of TestConvolve? Wouldn't we just want to add a parametrization over axes there?

Copy link
Contributor Author

@marcorudolphflex marcorudolphflex Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would make sense to vary the axes/kernel param here.
The reason for this test is to specifically test that this works with specified axes. We could do a unified test with axes parameters being None and non-None, but in this case we would probably just skip some tests if they would not make sense regarding the parameter combination? Alternatively, we could use some shared helpers to check the results to avoid duplicate code. Not sure what is more elegant here.

Comment on lines 324 to 326
raise ValueError(
f"Even-sized kernel along axis {ax_kernel} not supported for 'same' mode."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does numpy also complain about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it silently returns a result that is shifted by half a sample (floor division), i.e. it assumes center = floor(M/2) for kernel length M - which I dislike... But guess we want to be compatible with that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I think it's probably fine as-is, we're not really trying to match the scipy or numpy interface here

CHANGELOG.md Outdated
- Unified run submission API: `web.run(...)` is now a container-aware wrapper that accepts a single simulation or arbitrarily nested containers (`list`, `tuple`, `dict` values) and returns results in the same shape.
- `web.Batch(ComponentModeler)` and `web.Job(ComponentModeler)` native support
- Simulation data of batch jobs are now automatically downloaded upon their individual completion in `Batch.run()`, avoiding waiting for the entire batch to reach completion.
- Improved speed of autograd tracing for convolutions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not only related to autograd tracing right?

@marcorudolphflex
Copy link
Contributor Author

Thanks @marcorudolphflex this is great! I guess as a general observation we should keep in mind that this new implementation is slower for small arrays/kernels and faster for larger ones. Since for small convolutions both methods are fast in absolute terms and for large cases (where the cost of the convolution becomes noticeable) FFT will be significantly faster I think this is definitely a better default. It could be interesting to check with a little bit of benchmarking where this crossover lies and compare that to typical sizes in topopt problems where the arrays are generally on the order of a few hundred pixels in x and y and kernels are probably on the order of ~10 pixels.

image You are right, thats really size-dependent. In this test I used quadratic 2D inputs/kernels.

@marcorudolphflex
Copy link
Contributor Author

tweaked it a little bit (direct real fft and scipy.fft.next_fast_len), got this
image

@yaugenst-flex
Copy link
Collaborator

tweaked it a little bit (direct real fft and scipy.fft.next_fast_len)

Nice! I'd say for any realistic use case this would be strictly better than before!

@marcorudolphflex marcorudolphflex force-pushed the FXC-3961-faster-convolutions-for-tidy-3-d-plugins-autograd-filters branch from 24ea2ac to f1454dc Compare November 7, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants