Skip to content

[ENH] RocketGPU optimisations #3320

@TonyBagnall

Description

@TonyBagnall

Describe the feature or idea you want to propose

whilst assessing #3232 ran into another possible speed up in GPU whilst talking to the oracle.

The real overhead is here. This line is the big one:

_output_convolution = np.squeeze(_output_convolution.numpy(), axis=-1)

numpy() forces synchronisation and copies the convolution output from device to host (GPU to CPU) every time. Then you do NumPy work (+= bias) on CPU.

Then you call TensorFlow again in _get_ppv / _get_max, which means TF has to wrap/convert the NumPy array back into a tensor (often on CPU, sometimes with another host to device copy depending on placement).

That back-and-forth can easily dominate, and it will happen n_kernels * n_batches times.

What I’d change (minimal, high impact)

Keep everything as TF tensors until you have the 2 features, then convert that small 2-column array to NumPy once.

Describe your proposed solution

import tensorflow as tf

class RocketGPU:
    @staticmethod
    def _get_ppv(x):
        return tf.reduce_mean(tf.cast(x > 0, tf.float32), axis=1)

    @staticmethod
    def _get_max(x):
        return tf.reduce_max(x, axis=1)

    def _transform(self, X, y=None):
        tf.random.set_seed(self.random_state)

        X = X.transpose(0, 2, 1)
        X_tf = tf.convert_to_tensor(X)  # do this once

        batch_indices_list = self._generate_batch_indices(n=len(X))
        output_features = []

        for f in range(self.n_kernels):
            output_features_filter = []

            kernel = self._list_of_kernels[f]
            dilation = self._list_of_dilations[f]
            padding = self._list_of_paddings[f]
            bias = tf.convert_to_tensor(self._list_of_biases[f], dtype=X_tf.dtype)

            for batch_indices in batch_indices_list:
                idx = tf.convert_to_tensor(batch_indices, dtype=tf.int32)
                xb = tf.gather(X_tf, idx, axis=0)

                conv = tf.nn.conv1d(
                    input=xb,
                    stride=1,
                    filters=kernel,
                    dilations=dilation,
                    padding=padding,
                )
                conv = tf.squeeze(conv, axis=-1) + bias

                ppv = self._get_ppv(conv)
                mx = self._get_max(conv)

                feats = tf.stack([ppv, mx], axis=1).numpy()  # tiny copy back
                output_features_filter.append(feats)

            output_features.append(
                np.expand_dims(np.concatenate(output_features_filter, axis=0), axis=0)
            )

        output_rocket = np.concatenate(output_features, axis=0).swapaxes(0, 1)
        output_rocket = output_rocket.reshape(
            (output_rocket.shape[0], output_rocket.shape[1] * output_rocket.shape[2])
        )
        return output_rocket

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature, improvement request or other non-bug code enhancementtransformationsTransformations package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions